open-reaction-database / ord-data

Official data repository for the Open Reaction Database
https://open-reaction-database.org
Creative Commons Attribution Share Alike 4.0 International
210 stars 53 forks source link

Uploading dataset pbtxt for Pfizer HTE Suzuki #137

Closed wshlau closed 1 year ago

wshlau commented 1 year ago

This is an enumerated dataset of reactions from Pfizer (doi: 10.1126/science.aap911)

Archive.zip

skearnes commented 1 year ago

@michaelmaser @brilee could one of you review this?

skearnes commented 1 year ago

@wshlau thanks! Can you please add a commit that puts an extension on the file? Our automated tests require .pb or .pbtxt and the current file has no extension.

wshlau commented 1 year ago

Hi @skearnes, the zip folder contains a .pbtxt and an excel file. Am I missing something here?

skearnes commented 1 year ago

Ah I think I see the confusion. The zipfile is an attachment on the pull request, but the automated tests use whatever you committed with git, which in this case is just a file without an extension: https://github.com/open-reaction-database/ord-data/pull/137/files. For the tests to run, you'll need to push a new commit with git that includes the file extension.

I'm happy to create a new pull request for you using an enumerated dataset based on the inputs in the zipfile if you'd like.

wshlau commented 1 year ago

That would be great, thank you so much!

skearnes commented 1 year ago

(Superseded by #138)

ipendlet commented 1 year ago

Moved to #138

wshlau commented 1 year ago

Hi Ian,

This dataset was generated by the authors from the cited paper, but will be reported in an unpublished work from our lab (Doyle). We have received permission from the authors from the cited paper to report this dataset.

Best, Will

On Tue, Aug 9, 2022 at 4:38 PM Ian M. Pendleton @.***> wrote:

Hi @wshlau https://github.com/wshlau. Can you tell us a bit more about the experiments that you are trying to commit here? These don't look like the original work from the cited paper. Are these reactions reproductions that you did in lab? Are they associated independently with any ongoing publication?

— Reply to this email directly, view it on GitHub https://github.com/open-reaction-database/ord-data/pull/137#issuecomment-1209853572, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXQNAEUNMBKIUPGRPCXYYIDVYK6STANCNFSM54HGXBNA . You are receiving this because you were mentioned.Message ID: @.***>

ipendlet commented 1 year ago

@wshlau, I am still unclear about the content of the excel and pbtxt file that you included.

The reactions in the excel do not have 1:1 correspondence with data from the referenced publication -- at least not that I could see. Specifically, there is no reference to a 384 well screen in the original publication or supporting information. The reactions appear to be setup as replicates to the original publication; however, including the pfizer citation implies that the doi: 10.1126/science.aap9112 is the source of this data. That doesn't seem correct.

Based on the submission, I think that these reactions are reproductions (and that's great!). If these experiments are duplicates (i.e., repeats of the original reactions) we should not record them as being FROM the pfizer publication. They should have their own reference as to not confuse the origin of the chemistry.

With that in mind can you answer the following questions:

  1. When and who prepared the reactions in this submission?
  2. Does the paper that will be describing these reactions have (or will have) its own reference?
  3. Were the chemicals consumed to generated 1-2 the same as those used in the described reactions.pbtxt file? (i.e., the exact same lot / batch?)

The sources/citations referred to in the pbtxt file should be the same as the source/citation that answers the questions above.

For posterity, this sort of data tracking is important to ensure that entries we add are unique instances of experiments. We like the idea of recording repeated, reproducible chemistry. The instances of the reactions from the pfizer paper have already been uploaded and can be browsed here: https://open-reaction-database.org/client/search?dois=10.1126%2Fscience.aap9112&limit=100

However, if the submitted reactions are new reactions run in your lab then we definitely want to include them. We just need to ensure that the reference is appropriate for the data being included. If there is no paper yet, then it might be good to hold off on submission, or upload a draft of the paper to chemarxiv.

Let me know if you have any followup questions.

@skearnes Any clarifications?

wshlau commented 1 year ago

Hi,

Here's the reference of the preprint that will be describing the dataset: https://chemrxiv.org/engage/chemrxiv/article-details/62f6966269f3a5df46b5584b DOI: 10.26434/chemrxiv-2022-cljcp

Best, Will

On Sat, Aug 13, 2022 at 9:35 AM Will Sii Hong Lau @.***> wrote:

Hi Ian,

Apologies for the mistakes and confusion. The original citation doi: 10.1126/science.aap9112 https://doi.org/10.1126/science.aap9112 is not the source of this dataset. It is also not a 1:1 replicate of the original dataset. For example, the conditions were different: scale (0.0002 mmol vs. 0.0004 mmol), equivalence (ligand equiv. varies), other side products were detected and measured. This dataset was generated by Pfizer using the same automated flow system and analytical method prior to the Science paper and was used to demonstrate the idea of the system and justify a postdoc position in the first place.

This dataset will be reported for the first time in an unpublished work from the Doyle lab. We submitted this paper yesterday on ChemRxiv, so it should be available online in a few days. I will send over the doi once it is generated.

To answer your questions,

  1. When and who prepared the reactions in this submission? I do not know who exactly ran the reactions, but Pfizer generated the dataset prior to the original report doi: 10.1126/science.aap9112 https://doi.org/10.1126/science.aap9112. This dataset was disclosed to us by Neal Sach, the corresponding author of the Science paper.
  2. Does the paper that will be describing these reactions have (or will have) its own reference? Not currently, but it is now in submission.
  3. Were the chemicals consumed to generated 1-2 the same as those used in the described reactions.pbtxt file? (i.e., the exact same lot / batch?) Not sure if they are from the same lot/batch

Please let me know if you have any more questions. Again, I apologize for the mislabeling of the data, I can make the changes if needed.

Best regards, Will

On Fri, Aug 12, 2022 at 9:08 AM Ian M. Pendleton @.***> wrote:

@wshlau https://github.com/wshlau, I am still unclear about the content of the excel and pbtxt file that you included.

The reactions in the excel do not have 1:1 correspondence with data from the referenced publication -- at least not that I could see. Specifically, there is no reference to a 384 well screen in the original publication or supporting information. The reactions appear to be setup as replicates to the original publication; however, including the pfizer citation implies that the doi: 10.1126/science.aap9112 https://doi.org/10.1126/science.aap9112 is the source of this data. That doesn't seem correct.

Based on the submission, I think that these reactions are reproductions (and that's great!). If these experiments are duplicates (i.e., repeats of the original reactions) we should not record them as being FROM the original publication. They should have their own reference as to not confuse the origin of the chemistry.

With that in mind can you answer the following questions:

  1. When and who prepared the reactions in this submission?
  2. Does the paper that will be describing these reactions have (or will have) its own reference?
  3. Were the chemicals consumed to generated 1-2 the same as those used in the described reactions.pbtxt file? (i.e., the exact same lot / batch?)

The sources/citations referred to in the pbtxt file should be the same as the source/citation that answers the questions above.

For posterity, this sort of data tracking is important to ensure that entries we add are unique instances of experiments. We like the idea of recording repeated, reproducible chemistry. The instances of the reactions from the pfizer paper have already been uploaded and can be browsed here: https://open-reaction-database.org/client/search?dois=10.1126%2Fscience.aap9112&limit=100

However, if the submitted reactions are new reactions run in your lab then we definitely want to include them.

Let me know if you have any followup questions.

@skearnes https://github.com/skearnes Any clarifications?

— Reply to this email directly, view it on GitHub https://github.com/open-reaction-database/ord-data/pull/137#issuecomment-1213092310, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXQNAEQVEYEST5AWX3FHPQ3VYZEDRANCNFSM54HGXBNA . You are receiving this because you were mentioned.Message ID: @.***>

wshlau commented 1 year ago

Hi Ian,

Apologies for the mistakes and confusion. The original citation doi: 10.1126/science.aap9112 https://doi.org/10.1126/science.aap9112 is not the source of this dataset. It is also not a 1:1 replicate of the original dataset. For example, the conditions were different: scale (0.0002 mmol vs. 0.0004 mmol), equivalence (ligand equiv. varies), other side products were detected and measured. This dataset was generated by Pfizer using the same automated flow system and analytical method prior to the Science paper and was used to demonstrate the idea of the system and justify a postdoc position in the first place.

This dataset will be reported for the first time in an unpublished work from the Doyle lab. We submitted this paper yesterday on ChemRxiv, so it should be available online in a few days. I will send over the doi once it is generated.

To answer your questions,

  1. When and who prepared the reactions in this submission? I do not know who exactly ran the reactions, but Pfizer generated the dataset prior to the original report doi: 10.1126/science.aap9112 https://doi.org/10.1126/science.aap9112. This dataset was disclosed to us by Neal Sach, the corresponding author of the Science paper.
  2. Does the paper that will be describing these reactions have (or will have) its own reference? Not currently, but it is now in submission.
  3. Were the chemicals consumed to generated 1-2 the same as those used in the described reactions.pbtxt file? (i.e., the exact same lot / batch?) Not sure if they are from the same lot/batch

Please let me know if you have any more questions. Again, I apologize for the mislabeling of the data, I can make the changes if needed.

Best regards, Will

On Fri, Aug 12, 2022 at 9:08 AM Ian M. Pendleton @.***> wrote:

@wshlau https://github.com/wshlau, I am still unclear about the content of the excel and pbtxt file that you included.

The reactions in the excel do not have 1:1 correspondence with data from the referenced publication -- at least not that I could see. Specifically, there is no reference to a 384 well screen in the original publication or supporting information. The reactions appear to be setup as replicates to the original publication; however, including the pfizer citation implies that the doi: 10.1126/science.aap9112 https://doi.org/10.1126/science.aap9112 is the source of this data. That doesn't seem correct.

Based on the submission, I think that these reactions are reproductions (and that's great!). If these experiments are duplicates (i.e., repeats of the original reactions) we should not record them as being FROM the original publication. They should have their own reference as to not confuse the origin of the chemistry.

With that in mind can you answer the following questions:

  1. When and who prepared the reactions in this submission?
  2. Does the paper that will be describing these reactions have (or will have) its own reference?
  3. Were the chemicals consumed to generated 1-2 the same as those used in the described reactions.pbtxt file? (i.e., the exact same lot / batch?)

The sources/citations referred to in the pbtxt file should be the same as the source/citation that answers the questions above.

For posterity, this sort of data tracking is important to ensure that entries we add are unique instances of experiments. We like the idea of recording repeated, reproducible chemistry. The instances of the reactions from the pfizer paper have already been uploaded and can be browsed here: https://open-reaction-database.org/client/search?dois=10.1126%2Fscience.aap9112&limit=100

However, if the submitted reactions are new reactions run in your lab then we definitely want to include them.

Let me know if you have any followup questions.

@skearnes https://github.com/skearnes Any clarifications?

— Reply to this email directly, view it on GitHub https://github.com/open-reaction-database/ord-data/pull/137#issuecomment-1213092310, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXQNAEQVEYEST5AWX3FHPQ3VYZEDRANCNFSM54HGXBNA . You are receiving this because you were mentioned.Message ID: @.***>

ipendlet commented 1 year ago

@wshlau I believe that this is the publication: Torres, J. A. G.; Lau, S. H.; Anchuri, P.; Stevens, J. M.; Tabora, J. E.; Li, J.; Borovika, A.; Adams, R. P.; Doyle, A. G. A Multi-Objective Active Learning Platform and Web App for Reaction Optimization. J. Am. Chem. Soc. 2022. https://doi.org/10.1021/jacs.2c08592.

I just want to confirm a couple details before we move this through. In the paper it mention 352 experiments, yet in the dataset there are 376 experiments. Is this expected? Also, there is no mention of the light source for each reaciton in the dataset (blue light vs. photoreactor). I believe that this information is likely important to capture, but is it recorded anywhere on a reaction level in the dataset?

On my end, I've passed a merge to @skearnes to update the doi in the pbtxt file that was submitted. If there are no changes from the original submission than this should be acceptable. Please double check the updated file here: https://github.com/skearnes/ord-data/pull/3

Spot checking a couple reactions and their description is likely all that is needed. Thank you ahead of time!

wshlau commented 1 year ago

Hi Ian,

Yes, that is the publication.

  1. The entire dataset has 376 experiments, but 24 of those are control experiments (with no base) so only 352 datapoints were included in the analysis.
  2. Correct, these reactions do not use any light source. Blue light and photoreactors are used in another reaction (Ni/photoredox cross-electrophile coupling) in the paper.

Please let me know if you have any questions. Thank you so much, and hope you had a great thanksgiving!

Best, Will

On Fri, Nov 25, 2022 at 7:40 AM Ian M. Pendleton @.***> wrote:

@wshlau https://github.com/wshlau I believe that this is the publication: Torres, J. A. G.; Lau, S. H.; Anchuri, P.; Stevens, J. M.; Tabora, J. E.; Li, J.; Borovika, A.; Adams, R. P.; Doyle, A. G. A Multi-Objective Active Learning Platform and Web App for Reaction Optimization. J. Am. Chem. Soc. 2022. https://doi.org/10.1021/jacs.2c08592 .

I just want to confirm a couple details before we move this through. In the paper it mention 352 experiments, yet in the dataset there are 376 experiments. Is this expected? Also, there is no mention of the light source for each reaciton in the dataset (blue light vs. photoreactor). I believe that this information is likely important to capture, but is it recorded anywhere on a reaction level in the dataset?

On my end, I've passed a merge to @skearnes https://github.com/skearnes to update the doi in the pbtxt file that was submitted. If there are no changes from the original submission than this should be acceptable. Please double check the updated file here: skearnes#3 https://github.com/skearnes/ord-data/pull/3

Spot checking a couple reactions and their description is likely all that is needed. Thank you ahead of time!

— Reply to this email directly, view it on GitHub https://github.com/open-reaction-database/ord-data/pull/137#issuecomment-1327638638, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXQNAEREVTAYYFE7JPSY2K3WKDMVRANCNFSM54HGXBNA . You are receiving this because you were mentioned.Message ID: @.***>