open-reaction-database / ord-data

Official data repository for the Open Reaction Database
https://open-reaction-database.org
Creative Commons Attribution Share Alike 4.0 International
233 stars 59 forks source link

Biginelli dataset #78

Closed connorcoley closed 3 years ago

connorcoley commented 3 years ago

Paper: https://pubs.acs.org/doi/full/10.1021/cc010044j SI: https://pubs.acs.org/doi/suppl/10.1021/cc010044j/suppl_file/cc010044j_s.pdf

Files used for the enumeration attached. It ended up being a little more custom than desired.

Currently, this ignores one small detail in procedure, so let me know if I should revise the submission to include it: "If solutions could not be prepared because of insufficient solubility (2h, 2r), the particular building block was added manually to the urea/thiourea and catalyst components were added directly into the vial."

biginelli.zip

connorcoley commented 3 years ago

Current validation errors are the following:

dataset: Reaction.workups[2].input: Reaction input components require an amount
dataset: Reaction.outcomes[0].products[0].measurements[1]: Purity measurements should be defined as percentage values
dataset: Reaction.workups[4].input: Reaction input components require an amount

In the paper, "washed with cold (4 °C) EtOH" does not specify an amount (which isn't unusual for washing steps). And purity is reported as ">90%" across all compounds, without a reaction-specific value, which I feel is best reflected by the string_value instead of a percentage.

Thoughts re: relaxing these validations (@michaelmaser @skearnes )?

skearnes commented 3 years ago

Thoughts re: relaxing these validations (@michaelmaser @skearnes )?

Agreed that both could be downgraded to warnings.

brilee commented 3 years ago

On naming: suggestion to sub /Biginelli Condensation Dataset/ with /Microwave-assisted Biginelli Condensation Dataset/

Stirring - I see no stirring was specified, but I think we should assume stirring unless otherwise specified because it would be an odd thing to drop.

Everything else LGTM

connorcoley commented 3 years ago

I've updated the name, but haven't added stirring. I think assuming stirring might actually be incorrect, since these are microwave reactions

michaelmaser commented 3 years ago

Currently, this ignores one small detail in procedure, so let me know if I should revise the submission to include it: "If solutions could not be prepared because of insufficient solubility (2h, 2r), the particular building block was added manually to the urea/thiourea and catalyst components were added directly into the vial."

Thinking this should probably be included, would it be possible to add to the loop? Looking at the notebook now

I've updated the name, but haven't added stirring. I think assuming stirring might actually be incorrect, since these are microwave reactions

Think I agree here, microwave reactions often go unstirred

brilee commented 3 years ago

SGTM for leaving the description unstirred

brilee commented 3 years ago

When spotchecking products, I noticed that in reaction 3, the urea has a phenylthiourea as a reagent, but there's no phenyl group on the product. I think this was caused by the paper's misleading use of φ (lowercase phi, the phenylthiourea) vs Φ (uppercase phi, the plain thiourea). This switchup appears wherever Φ is the urea specified, so it's probably a mistake in the urea input structure enumeration.

connorcoley commented 3 years ago

When spotchecking products, I noticed that in reaction 3, the urea has a phenylthiourea as a reagent, but there's no phenyl group on the product. I think this was caused by the paper's misleading use of φ (lowercase phi, the phenylthiourea) vs Φ (uppercase phi, the plain thiourea). This switchup appears wherever Φ is the urea specified, so it's probably a mistake in the urea input structure enumeration.

Good catch. I've made the phenylthiourea (NC(NC1=CC=CC=C1)=S) $\Omega$ now. It seems like only one product actually uses the phenylthiourea, and most use NC(N)=S

Thinking this should probably be included, would it be possible to add to the loop? Looking at the notebook now

I've added this as a condition detail, but have left the procedure as reflected by the inputs as-is. The statement in the methods section is a little unclear (to me) so I've copied it verbatim.

Archive.zip