microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
27 stars 8 forks source link

Determine if and how `link_class_info` will be used in the submission schema #678

Open mslarae13 opened 1 year ago

mslarae13 commented 1 year ago

slot link_class_info has no guidance or examples.

criteria for completion

mslarae13 commented 1 year ago

? Leave it out? Not really used... non-actionable DB field

turbomam commented 1 year ago

link_class_info values in the BBOP SQLite version of NCBI biosample_set.xml as of 2023-05-18

value count link notes
not collected 432
not applicable 415
NA 114
http://geointa.inta.gov.ar/visor/?p=model_suelos 108 This site can’t be reached;Check if there is a typo in geointa.inta.gov.ar.;DNS_PROBE_FINISHED_NXDOMAIN
http://doi.org/10.1002/jpln.200521814 71 Chernozem—Soil of the Year 2005
missing 65
http://www.fao.org/nr/land/sols/soil/wrb-soil-maps/reference-groups 64 Page not found
https://www.fao.org/3/i3794en/I3794en.pdf 60 World reference base for soil resources 2014
Chromic Haploxernt 48
na 19
doi:10.1016/j.fcr.2011.09.019 6 https://www.sciencedirect.com/science/article/abs/pii/S0378429011003340, "Evidence of improved water uptake from subsoil by spring wheat following lucerne in a temperate humid climate"
http://esdac.jrc.ec.europa.eu/resource-type/european-soil-database-maps 1 live
turbomam commented 1 year ago

Specification from https://github.com/GenomicsStandardsConsortium/mixs/blob/main/mixs/excel/mixs_v6.xlsx

Environmental package agriculture soil
Structured comment name link_class_info link_class_info
Package item link to classification information link to classification information
Definition Link to digitized soil maps or other soil classification information Link to digitized soil maps or other soil classification information
Expected value PMID,DOI or url PMID,DOI or url
Value syntax {PMID|DOI|URL} {PMID}|{DOI}|{URL}
Example    
Requirement X X
Preferred unit    
Occurrence 1 1
MIXS ID MIXS:0000329 MIXS:0000329
turbomam commented 1 year ago

It seems like link_class_info might function like one of the many x_meth fields, which state how the value for slot x was determined. But I can't tell what slot link_class_info might provide context for.

As shown above, the number of biosamples that are annotated with an informative link_class_info is in the low hundreds, out of 35 million. (I am defining informative as something other than a synonym for "not available" or a live web link.)

@mslarae13 I propose that we omit this field from NMDC submission templates. A longer term action could be filing an issue at https://github.com/GenomicsStandardsConsortium/mixs/issues asking for documentation or examples of this term's use. I hope that that request wouldn't be construed as invitation for open-ended discussion about the subject. I suppose it could also be useful to be put in touch with the person who requested that term in the first place.

turbomam commented 1 year ago

link_class_info has not been used for any biosamples in the NMDC production MongoDB yet

db.getCollection("biosample_set").find( { link_class_info : { $exists : true } } );

0

ssarrafan commented 1 year ago

Adding to current sprint per Mark. Need feedback from @mslarae13

mslarae13 commented 1 year ago

I propose that we omit this field from NMDC submission templates. A longer term action could be filing an issue at https://github.com/GenomicsStandardsConsortium/mixs/issues asking for documentation or examples of this term's use.

-- I agree with this @turbomam

ssarrafan commented 1 year ago

@turbomam can this be closed now that Montana has provided feedback?

mslarae13 commented 1 year ago

@ssarrafan I don't think we can close it yet. Steps for resolution are remove from NMDC submission portal. & submit issue to GSC

ssarrafan commented 1 year ago

@ssarrafan I don't think we can close it yet. Steps for resolution are remove from NMDC submission portal. & submit issue to GSC

Ah ok. I thought this ticket was just to "determine if and how..."
I'll move to the next sprint

mslarae13 commented 1 year ago

Chris Hunter suggest deprecating this term in GSC. Montana will put an issue into GSC.

@turbomam Should we remove from NMDC now, or wait for GSC update?

mslarae13 commented 1 year ago

https://github.com/GenomicsStandardsConsortium/mixs/issues/590

turbomam commented 1 year ago

I'll remove it in 7.6.1

mslarae13 commented 1 year ago

@turbomam did this get removed?

mslarae13 commented 5 months ago

Currently manually managing. We do not ingest all MIxS slots, see https://github.com/microbiomedata/nmdc-schema/blob/main/assets/import_mixs_slots_regardless.tsv

MIxS that we pull into NMDC schema is 6.0 ..Should pull in 6.2, but some slot changes have been made (names, presence/absence)

ssarrafan commented 4 months ago

I'm moving this to the current sprint based on @mslarae13 last comment on 4/29.

ssarrafan commented 4 months ago

@turbomam @mslarae13 should this just be in the backlog? I can remove it from the sprint.

ssarrafan commented 4 months ago

Sprint over, removing from sprint.