Closed blester125 closed 4 months ago
For the record, from https://www.dol.gov/general/aboutdol/copyright
As part of the terms of granting the patent to the inventor, patents are published into the public domain.
Would this be relevant in this context?
https://www.uspto.gov/learning-and-resources/bulk-data-products
https://bulkdata.uspto.gov/ I'll take a look into some of those sets.
If this line of inquiry is fruitful, the following might be useful as well.
It ostensibly combines multiple countries datasets and multiple other patent datasets as well. However, I do not have a proper GCP account (which is necessary for the queries and even the queries cost money) so I'd appreciate input from somebody familiar with GCP / GCP datasets https://console.cloud.google.com/marketplace/product/google_patents_public_datasets/google-patents-public-data?pli=1&project=api-project-904060009868
@sunnydigital @chris-ha458 any updates on this?
@sunnydigital @chris-ha458 any updates on this?
Hi Stella, I'm no longer working on this project. Let me unassign myself.
If this line of inquiry is fruitful, the following might be useful as well.
It ostensibly combines multiple countries datasets and multiple other patent datasets as well. However, I do not have a proper GCP account (which is necessary for the queries and even the queries cost money) so I'd appreciate input from somebody familiar with GCP / GCP datasets https://console.cloud.google.com/marketplace/product/google_patents_public_datasets/google-patents-public-data?pli=1&project=api-project-904060009868
Had a look and they only have text available for US publications. Other countries just have (v. short) abstracts, from what I could tell. I can take a look at the sets available from USPO if no one else is working on this.
@baberabb can you share how you accessed it? Did it require GCP credits?
@StellaAthena I do think this is a plausible pathway, but I am not able to spearhead it at the moment. I will try to assist any effots though.
@baberabb can you share how you accessed it? Did it require GCP credits?
It's available through BigQuery which is Google's SQL-like database system. And Yes! charged me $20 and I just made a few requests. I think if you still have free GCP credits then you can use that.
Ok got trial access and did some more experimenting and we can just use the Google dataset IMO. They provide full-text for all US patent publications (not applications) and titles/abstracts for all others. All in plain-text as well so will be easy to format. Total 150m rows and seems to have the full US record till Oct 27, 2023.
sample extract here.
Ok got trial access and did some more experimenting and we can just use the Google dataset IMO. They provide full-text for all US patent publications (not applications) and titles/abstracts for all others. All in plain-text as well so will be easy to format. Total 150m rows and seems to have the full US record till Oct 27, 2023.
sample extract here.
Amazing!
Domain: Patents
Can we use the Google Patents data for this?
It might be possible to use C4/Common Crawl data for this as
patents.google.com
is one of the most represented domains in c4