open-reaction-database / ord-data

Official data repository for the Open Reaction Database
https://open-reaction-database.org
Creative Commons Attribution Share Alike 4.0 International
225 stars 56 forks source link

USPTO Dataset Question #181

Closed Peggy-Li closed 6 months ago

Peggy-Li commented 6 months ago

How much of the Lowe USPTO dataset is in ORD? I saw in the 0.1.0 release note that you included the grants data. Does that mean there is no data from the applications (Like 2001-2016 applications from https://figshare.com/articles/dataset/Chemical_reactions_from_US_patents_1976-Sep2016_/5104873) Are grants and applications a very close match in their contents?

Thanks.

skearnes commented 6 months ago

Hi, you are correct that we've only included the grant data. Most of the published retrosynthesis ML literature uses a subset of the grants data, which is why we focused on that.

skearnes commented 6 months ago

@connorcoley may have more to say about the contrast between grants and applications?

connorcoley commented 6 months ago

I don't have anything specific to add; we focused on the grant data as a key set that is often used for downstream modeling tasks

For the years in common, we do expect a very very high degree of overlap between data from grants and data from applications

Peggy-Li commented 6 months ago

Thanks for the clarification!