microbiomedata / sheets_and_friends

Enhance a LinkML model with imported and optionally modified slots
0 stars 0 forks source link

remove env_package column from our DH templates. It can be inferred from the selected template #136

Open turbomam opened 2 years ago

turbomam commented 2 years ago

any comments, @mslarae13 ?

mslarae13 commented 2 years ago

Funny, I thought there was also a ticket to "add env_package"

It can be inferred, but it's not in the sheet & we need it in the sheet in EMSL because of how we're collecting the information. If there's a way to make it auto-populate, great but the slot shouldn't be totally removed.

turbomam commented 2 years ago

I have a ticket with GSC that env_package should be added back to the MIxS model. Theoretically we're using it off the books now.

Yes, we should be able to share tabular data about biosamples including an env_package column.

When I convert the biosample metadata collected from DataHarmonizer, so that it can be inserted into MongoDB, the env_package is merged in. In this case, the env_package value comes from the study description pages. It can be reported out as a table with the env_package column during the conversion, or after being loaded into MongoDB.

ssarrafan commented 2 years ago

@turbomam moving this to Sept but please let me know if you're not actively working on it for the next 2 weeks

mslarae13 commented 2 years ago

Yes, we should be able to share tabular data about biosamples including an env_package column.

When I convert the biosample metadata collected from DataHarmonizer, so that it can be inserted into MongoDB, the env_package is merged in. In this case, the env_package value comes from the study description pages. It can be reported out as a table with the env_package column during the conversion, or after being loaded into MongoDB.

If so, then the column can go away. Just need to be sure it's included to the template that's shared with EMSL.

turbomam commented 2 years ago

See Kitware's submission retrieval API, https://data.dev.microbiomedata.org/api/metadata_submission?offset=0&limit=25

It captures the environmental package in the results.[].metadata_submission.packageName path

It will be retained by the biosample instantiated and loaded into MongoDB

mslarae13 commented 2 years ago

So, yes. I think we need to keep the column until we have a better way of submitting and parsing the metadata for each institution (EMSL, JGI). I'm currently doing it by hand. But once we have that we can remove the column.

With that, I think this is 'back log' until we have that process functional.

turbomam commented 2 years ago

I don't understand. There is no need t capture the environmental package in the DH template.

turbomam commented 2 years ago

If doing something by hand puts constraints on the workflow, let's figure out how we can automate your work.

turbomam commented 2 years ago

Let's remove it, and then you can show me the negative consequences.

mslarae13 commented 2 years ago

Don't remove it yet. Negative consequence is when someone submits samples to EMSL using the NMDC portal, I need that column. If we remove it, the only way I can access it is by going into the submission portal, finding that persons specific submission, and clicking through until I get to the "package" tab.

By hand is because there is no "submit" metadata and there is no automated way of converting from NMDC submission portal template to JGI or EMSL template formats. So I have to copy and shuffle columns manually.

Until we have the process of

  1. User completes metadata & clicks 'submit'
  2. Metadata goes through NMDC checks / approvals
  3. (option1) submission is accepted, move to 5
  4. (option2) submission is declined, contact user & ask them to re-visit/help
  5. Metadata is imported to a database (labeled, or put somewhere that does NOT get put onto the data portal)
  6. Metadata is parsed based on JGI or EMSL needed columns & re-formatted to the structure required for that institute.

We're a long way from having this functional.

turbomam commented 2 years ago

Ok, I can kinda see that but would appreciate it if you could walk me though the process.

turbomam commented 2 years ago

What we really need is for a display of the env package (or even the full template name) in the header of the templates

ssarrafan commented 2 years ago

Discussed with @turbomam and moving to the backlog.