Closed FanwangM closed 1 year ago
In this example, there are many recorded "products" with reaction roles other than PRODUCT. This is because they quantified the peak areas for many species, including leftover unreacted starting material. We would like to capture that analytical data, so those species are listed under the outcomes so we can assign peak areas to them.
From the perspective of cleaning up the data to make it ML-ready, you should consider checking the reaction roles as your snippet shows. If roles are assigned for the species in reaction_output.products
, only keep the PRODUCT
one(s).
It does look like there could be species labeled as reactants in the outcomes that don't appear in the inputs, which would indeed be odd
Could possibly be a problem with data, but this doesn't belong in the ord-interface issues. Closing for now.
I was using
ord-14091a23403d4d96bdcbd0a64f981f4d
as my toy example, but I get confused. I am not sure what would be the reactants and what would be the products. As we can see from https://open-reaction-database.org/client/id/ord-14091a23403d4d96bdcbd0a64f981f4d#outcomes, the reaction is so long. The reaction should be a simple Suzuki coupling reaction as shown in Scheme 1 at https://doi-org.libproxy.mit.edu/10.1039/C9RE00086K.My understanding is that reactants should be in
inputs
message and products should be in.outcomes
maeesage. Now if we supposerxn_old
is my reaction message forord-14091a23403d4d96bdcbd0a64f981f4d
, I can doIs this paper not properly prepared? Even the authors did measurements at different time points, I think we should have one consistent chemical reaction SMILES. From the above, we can see that reactants are not in
inputs
message and reactant SMILES can present inoutcomes
(this is normal as the reaction will take some time to finish). What should be the optimal/right way to deal with reactions like this? Thank you.@connorcoley @skearnes