google / patents-public-data

Patent analysis using the Google Patents Public Datasets on BigQuery
https://bigquery.cloud.google.com/dataset/patents-public-data:patents
Apache License 2.0
530 stars 162 forks source link

Has the quarterly dataset update schedule been adhered to? #14

Open SKalt opened 5 years ago

SKalt commented 5 years ago

On the cloud console details page both the public patents and research datasets are said to update quarterly. However, the bigquery details tab for each datatset lists February as the last update (6-7 months ago at the time of this issue).

In the event this isn't the right place to ask about the dataset updates, where should I reraise this issue?

sfd9898 commented 5 years ago

Also curious about this. The "Publications" table in the big query details lists that it was last updated November 2018 and is larger (GB) than the dated tables. Can someone confirm that maybe that table is just being added to?

feltenberger commented 5 years ago

@ostegm or @wetherbeei -- can you comment on this? I don't have any visibility into the update cadence, unfortunately, so I can't answer.

wetherbeei commented 5 years ago

There was a delay for one cycle while we improved how we normalize publication numbers to reduce the time needed for each update. We've finished that change and did an update in Nov 2018. Future updates will now stick to the original quarterly plan.

sanealytics commented 3 years ago

Sorry, but this page seems to indicate that the dataset was last updated on 9/13/19. Is this correct?

wetherbeei commented 3 years ago

That date is (confusingly) the last updated date for the marketplace info page. Look at the last modified date of the table in BigQuery: https://console.cloud.google.com/bigquery?p=patents-public-data&d=patents&page=table -> details.

mareksuscak commented 2 years ago

It appears that we're running one quarterly update behind again with the last one being the one from May 2021. Are there any plans to refresh the patents dataset anytime soon?

wetherbeei commented 2 years ago

We've updated the tables again this week

sanealytics commented 2 years ago

@wetherbeei Is it possible to open source the ETL pipeline? I'm assuming its Dataflow.

wetherbeei commented 2 years ago

Sorry it isn't possible- it relies on an underlying data feed from IFI Claims.

If update speed is a concern, IFI Claims has a weekly updated paid BigQuery table with many more fields and all of the translated full text.

sanealytics commented 2 years ago

Thanks for that info.