open-reaction-database / ord-data

Official data repository for the Open Reaction Database
https://open-reaction-database.org
Creative Commons Attribution Share Alike 4.0 International
210 stars 53 forks source link

dataset submission for item 47463309 from DOI 10.1038/s41467-023-42446-5 #183

Closed qai222 closed 2 weeks ago

qai222 commented 2 weeks ago

1430 reactions from 10.1038/s41467-023-42446-5 (only the HTE reactions are included)

bdeadman commented 2 weeks ago

It looks like process_submission check is failing because the submission workflow is using ord-schema 0.3.65 which pre-dates the addition of PIPETTE enums.

ValueError: error parsing dataset.pbtxt: 44:15 : '        type: PIPETTE': Enum type "ord.ReactionInput.AdditionDevice.AdditionDeviceType" has no value named PIPETTE.
Error: Process completed with exit code 1.

Possibly solve by updating env values in the ord-data workflows .

The check_target_branch check could also be due to the ord-schema version difference?

check_target_branch

Run echo "Target branch: ${GITHUB_BASE_REF}"
  echo "Target branch: ${GITHUB_BASE_REF}"
  test "${GITHUB_BASE_REF}" != "main"
  shell: /usr/bin/bash -e {0}
  env:
    ORD_SCHEMA_TAG: v0.[3](https://github.com/open-reaction-database/ord-data/actions/runs/9771576537/job/26989249049?pr=183#step:2:3).65
Target branch: main
Error: Process completed with exit code 1.
skearnes commented 2 weeks ago

I'll fix the version today.

On Wed, Jul 3, 2024, 7:08 AM Ben Deadman @.***> wrote:

It looks like process_submission check is failing because the submission workflow is using ord-schema 0.3.65 which pre-dates the addition of PIPETTE enums.

ValueError: error parsing dataset.pbtxt: 44:15 : ' type: PIPETTE': Enum type "ord.ReactionInput.AdditionDevice.AdditionDeviceType" has no value named PIPETTE. Error: Process completed with exit code 1.

Possibly solve by updating env values in the ord-data workflows https://github.com/open-reaction-database/ord-data/tree/main/.github/workflows .

The check_target_branch check could also be due to the ord-schema version difference?

check_target_branch

Run echo "Target branch: ${GITHUB_BASE_REF}" echo "Target branch: ${GITHUB_BASE_REF}" test "${GITHUB_BASE_REF}" != "main" shell: /usr/bin/bash -e {0} env: ORD_SCHEMA_TAG: v0.3.65 Target branch: main Error: Process completed with exit code 1.

— Reply to this email directly, view it on GitHub https://github.com/open-reaction-database/ord-data/pull/183#issuecomment-2205819516, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHITGKKFUMYPOLHOUTCMODZKPLQXAVCNFSM6AAAAABKIW5BE2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBVHAYTSNJRGY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

skearnes commented 2 weeks ago

See #184

skearnes commented 2 weeks ago

@qai222 The DOI test failures are a bug; I'll send a PR to ord-schema

skearnes commented 2 weeks ago

@bdeadman can you take a look at this dataset for QC?

bdeadman commented 2 weeks ago

Dataset validates and a spot check of the pbtxt file showed no obvious problems. Merging into main.

skearnes commented 2 weeks ago

Thanks. So what happens next is that we create a PR to main from the branch we merged into. We'll need to open and close the PR a couple of times since the automated checks add commits to the branch that reset the tests. Then we merge into main. (Submissions from forks are required to merge into a non-main branch at first, since we don't have permissions to add commits to branches in forks. I recently saw an example of how to work around this, but that's the status quo workflow for now.)

On Wed, Jul 3, 2024, 4:43 PM Ben Deadman @.***> wrote:

Merged #183 https://github.com/open-reaction-database/ord-data/pull/183 into #183 https://github.com/open-reaction-database/ord-data/pull/183.

— Reply to this email directly, view it on GitHub https://github.com/open-reaction-database/ord-data/pull/183#event-13386020380, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHITGKWKQPJ3JXNXNZEBP3ZKRO7DAVCNFSM6AAAAABKIW5BE2VHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJTGM4DMMBSGAZTQMA . You are receiving this because you modified the open/close state.Message ID: <open-reaction-database/ord-data/pull/183/issue_event/13386020380@ github.com>

bdeadman commented 2 weeks ago

@skearnes thanks for the clarification. I thought it would be that. We will hold on that for now though. Me and @qai222 had a discussion at OH today, and there may be some changes we want to make to the dataset before pushing it to main.

qai222 commented 1 week ago

@skearnes thanks for the clarification. I thought it would be that. We will hold on that for now though. Me and @qai222 had a discussion at OH today, and there may be some changes we want to make to the dataset before pushing it to main.

I uploaded my processing scripts to this repo. A few notes from the discussion with @bdeadman and the script:

  1. The solvent.volume should have volume_includes_solutes=True specified. This field is unspecififed in the current submission.
  2. Convention needed for describing room temperature as a Temperature message.
  3. Convention needed for describing pre-mix, e.g.
    After placing the 96-well plates in ultrasonic water bath for 10 seconds to mix the reaction uniformly, 
    each 96-well plate was covered with an optical glass to minimize the volatilization of solvent and component

    (right now this is in ReactionSetup.environment).

  4. The quantification of products includes a derivatization reaction. One can include it as ReactionWorkups or referencing another Reaction record. We chose the former. This led to ambiguity in assigning reaction_role: A is added as a catalyst of the derivatization reaction, should the role of A be workup or catalyst?
  5. ReactionInput has addition_order specified, but workups is a sequence, so workup1 precedes workup2 implies workup1.input.addition_order >= workup2.input.addition_order. There should be a validation function for this.
  6. Enantiomer products: what should the product SMILES be in reaction SMILES (one without chiral or two with chiral)? how should one describe the ProductCompound? What I did is
    ProductCompound(
    identifiers=[
        CompoundIdentifier(type="SMILES", value=product_s_smi),
        CompoundIdentifier(type="SMILES", value=product_r_smi),
    ],
    is_desired_product=True,
    ...