microbiomedata / nmdc-server

Data portal client and server for NMDC.
https://data.microbiomedata.org
Other
9 stars 0 forks source link

Indicate MIxS terms #555

Closed ssarrafan closed 2 years ago

ssarrafan commented 2 years ago

Indicate when a term is from MIxS

Related to #448

ssarrafan commented 2 years ago

Brandon, please add T-Shirt size and any questions. Team is trying to get an idea of the time/effort to do this.

subdavis commented 2 years ago

Please provide information/documentation about how to know if a term comes from MIxS. Is this consumable from code, or will we need to maintain own mappings? Where should we be looking for this info?

Thanks.

pvangay commented 2 years ago

@ssarrafan this should be reassigned to Mark M. and Bill. Mark is currently working with Montana to identify sources of each term (where they came from) for display on the metadata submission interface. Once that information is available, hopefully Bill can include that information in the schema so that Brandon can also display it via the portal.

ssarrafan commented 2 years ago

@pvangay ok I've assigned this to Bill and Mark. Is the expectation that this will be done this month or can I move this to the January sprint?

pvangay commented 2 years ago

good question for @turbomam

cmungall commented 2 years ago

there are a variety of ways to programmatically extract this from the schema

I can advise but need more information about the overall dataflow. I am assuming for UI purposes you will want a ready-made json blob containing all metadata about a field including source, description, hyperlinks for more info etc. Our libs for doing this are python but we can easily precompile json for you.

subdavis commented 2 years ago

I'm already consuming nmdc-schema repo as a git submodule, so any JSON file that exists in that repo is something I can grab and use. Other kinds of data (xml, yaml) would probably also be OK, but JSON is preferable.

wdduncan commented 2 years ago

@subdavis The mixs are in the mixs.yaml located in the directory here:
https://github.com/microbiomedata/nmdc-schema/tree/main/src/schema

You can load the yaml directly yourself and convert to json or I can add a util to do this. What do you prefer?

ssarrafan commented 2 years ago

@wdduncan can this issue be closed?

wdduncan commented 2 years ago

@ssarrafan I do not know. What do you think @subdavis ?

turbomam commented 2 years ago

Sorry, I'm late to the game.

Where should it be indicated that a term comes from MIxS?

If a term is to be used in the NMDC DataHarmonizer, it will be marked with a disposition of borrowed or use as-is on the mixs_packages_x_slots tab of Soil-NMDC-Template_Compiled

Slots/columns that are modification of a MIxS slot appear in mixs_modified_slots

I will be proposing a new structure for this Google Sheet soon, so some of that may become moot.

In terms of how the terms appear in DataHarmonizer, that will be determined by the section column in those two sheets. I believe @mslarae13 is assigning the MIxS as-is, borrowed and modified terms to DH section whose names will indicate which terms "come from" MIxS. @sujaypatil96 and I are working on the section assignment now.

subdavis commented 2 years ago

Should be a small amount of effort. Also, I won't be able to directly map lat_lon to latitude and longitude so we should talk about what sort of interventions are needed for edge cases like that.

ssarrafan commented 2 years ago

Based on the recent comments I will move this one to February. @turbomam and @wdduncan let me know if it should be in the backlog or assigned to someone else.

wdduncan commented 2 years ago

@subdavis I can create a json file on the github repo, or you can convert it yourself. Just let me know which prefer. As for the lat_lon issue, I don't know what best solution is. @dehays perhaps we can discuss this at the metadata meeting. @subdavis It would be helpful if you could attend the meeting too.

mslarae13 commented 2 years ago

I am really late to this game! But saw the message in slack & checked this out. Is this for the data harmonizer or read the docs / schema definitions?

wdduncan commented 2 years ago

See work discussed in this ticket https://github.com/microbiomedata/nmdc-schema/issues/252

turbomam commented 2 years ago

There are roughly 100 elements in src/schema/mixs.yaml that already have an in_subset, like environment for elev

what are the consequences of assigning more than one subset?

Syntactically, in_subset is multivalued

ssarrafan commented 2 years ago

@wdduncan and @turbomam can we close this issue? Seems like work is being tracked under the nmdc-schema#252?

ssarrafan commented 2 years ago

Removed @wdduncan per his note.

HI Set. Here is an update: https://github.com/microbiomedata/nmdc-schema/issues/134 This is an ongoing issue that will need to be passed on to Mark or Sujay (not sure which). https://github.com/microbiomedata/nmdc-runtime/issues/89 I updated the comment on this. I should be able to get the change sheet edit done before I leave. https://github.com/microbiomedata/nmdc-schema/issues/195 I am working with Sujay on this. I should be able to close before the week’s end. https://github.com/microbiomedata/nmdc-runtime/issues/46 This is an ongoing issue that will need to be passed on to Mark or Sujay (not sure which). https://github.com/microbiomedata/nmdc-server/issues/555 This is a lot of conversation in this thread. But, it looks like Mark has taken this one over.

turbomam commented 2 years ago

I think (but haven't proven yet) that all MIxS slots in https://github.com/microbiomedata/nmdc-schema/blob/issue-291-mixs-submod/src/schema/mixs_6_for_nmdc.yaml are annotated as follows:

from_schema: http://w3id.org/mixs/terms

Having said that,

I'm pretty sure that those annotations appear in @sujaypatil96's new-ish gen-linkml JSON, but I haven't confirmed yet.

turbomam commented 2 years ago

But the from_schema may change after further imports/merges.

Yes, switch to source

turbomam commented 2 years ago

Solution in https://github.com/microbiomedata/nmdc-schema/pull/292

closing this issue in anticipation of a merge in May 2022