yoheikikuta / US-patent-analysis

This is a repository of the analysis of US patent.
8 stars 8 forks source link

Understanding data #1

Open yoheikikuta opened 6 years ago

yoheikikuta commented 6 years ago

TARGET DATA

Bulk Data Storage System: https://bulkdata.uspto.gov/#pats From this system we use the following datasets for our analysis.

Patent Application Full Text Data (No Images) (MAR 15, 2001 - PRESENT)

This dataset is composed of xml files (these are zip compressed) of each week's publication. Each xml file is as follows:

$ head -n 20 ipa170105.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE us-patent-application SYSTEM "us-patent-application-v44-2014-04-03.dtd" [ ]>
<us-patent-application lang="EN" dtd-version="v4.4 2014-04-03" file="US20170000001A1-20170105.XML" status="PRODUCTION" id="us-patent-application" country="US" date-produced="20161220" date-publ="20170105">
<us-bibliographic-data-application lang="EN" country="US">
<publication-reference>
<document-id>
<country>US</country>
<doc-number>20170000001</doc-number>
<kind>A1</kind>
<date>20170105</date>
</document-id>
</publication-reference>
<application-reference appl-type="utility">
<document-id>
<country>US</country>
<doc-number>14789882</doc-number>
<date>20150701</date>
</document-id>
</application-reference>
<us-application-series-code>14</us-application-series-code>

Important tags:

Patent Grant Full Text Data (No Images) (JAN 1976 - PRESENT)

This dataset is composed of xml files (these are zip compressed) of each week's granted patent. Each xml file is as follows:

$ head -n 20 ipg120103.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE us-patent-grant SYSTEM "us-patent-grant-v42-2006-08-23.dtd" [ ]>
<us-patent-grant lang="EN" dtd-version="v4.2 2006-08-23" file="USD0651376-20120103.XML" status="PRODUCTION" id="us-patent-grant" country="US" date-produced="20111219" date-publ="20120103">
<us-bibliographic-data-grant>
<publication-reference>
<document-id>
<country>US</country>
<doc-number>D0651376</doc-number>
<kind>S1</kind>
<date>20120103</date>
</document-id>
</publication-reference>
<application-reference appl-type="design">
<document-id>
<country>US</country>
<doc-number>29390372</doc-number>
<date>20110423</date>
</document-id>
</application-reference>
<us-application-series-code>29</us-application-series-code>

Important tags:

Patent Application Office Actions Research Dataset (Stata (.dta) and MS Excel (.csv)) (2008 - JUN 2017)

URL: https://bulkdata.uspto.gov/data/patent/office/actions/bigdata/2017/ See this pdf for detail.

yoheikikuta commented 6 years ago

How to use these datasets: