open-reaction-database / ord-schema

Schema for the Open Reaction Database
https://open-reaction-database.org
Apache License 2.0
93 stars 26 forks source link

Some USPTO reactions have no inputs/outputs #617

Closed skearnes closed 2 years ago

skearnes commented 2 years ago

See https://twitter.com/egonwillighagen/status/1456569589756739588?s=20 and https://client.open-reaction-database.org/id/ord-00000fa651bf4f1f8a58d8e503d59996

This is due to cases where we only extracted a reaction SMILES from the USPTO CML record. In cases where there are no inputs/outputs, we should populate those fields from the reaction SMILES to make them show up in the landing pages and be searchable in the interface.

skearnes commented 2 years ago

This affects three datasets; note that these are the USPTO subset from rexgen_direct, not the ones we parsed from CML.

dataset # reactions
ord_dataset-488402f6ec0d441ca2f7d6fabea7c220 40000
ord_dataset-5481550056a14935b76e031fb94b88be 30000
ord_dataset-de0979205c84441190feef587fef8d6d 409035
skearnes commented 2 years ago

Fix is deployed.