Open kheal opened 2 weeks ago
Team proteomics have decided to remove these contaminants from the existing mongo records and will fix the source files for the uniprot mapping for future use of the workflow.
Entries of interest are: Contaminant_TRYP_PIG Contaminant_Trypa1 Contaminant_Trypa2 Contaminant_Trypa3 Contaminant_Trypa4 Contaminant_Trypa5 Contaminant_Trypa6 Contaminant_TRYP_BOVIN Contaminant_CTRA_BOVIN Contaminant_CTRB_BOVIN Contaminant_ALBU_HUMAN Contaminant_ALBU_BOVIN Contaminant_K2C1_HUMAN Contaminant_K22E_HUMAN Contaminant_K1C9_HUMAN Contaminant_K1C10_HUMAN
Current Behavior
Some MetaProteomics records have non-compliant values in the the
best_protein
andall_protein
slots. The range of these slots is aGeneProduct
, which have a range ofid
as a uriorcurie. See this file for full list of non uriorcurie valuesExpected Behavior
NMDC uriorcurie slots should be populated by a CURIe, with a prefix, a colon and a local identifier, like
nmdc:wfmgan-11-pmh0a992.1_0000691_21398_23068
.Steps To Reproduce
See below for R script to find these non-compliant values in mongo.
Notes
Closing this issue will unblock https://github.com/microbiomedata/nmdc-schema/issues/2028