@bmeluch @brynnz22 thank you for all the work on the NEON NMDC term mapping file: neon_nmdc_term_mappings.tsv
I was reviewing the TSV file and noticed a few things that need to be corrected before the NEON pipeline that we're writing in the nmdc-runtime repo can fully consume it without any ambiguities:
Ideally the program can consume only _exactmappings, i.e., table columns that map exactly to either scalar slots or nested slots. It's useful to record the kind of mappings that exist between columns and slots in the TSV file and wherever possible we should try and record the way in which the columns can be used / parsed to get to the format which is acceptable to the target slot in the "notes" column of the TSV file.
How should testMethod be parsed so that we get individual values for tot_nitro_cont_meth, tot_org_c_meth, protocol_link.url / protocol_link.name
Only one of ammoniumNRepNum, nitrateNitriteNRepNum, analyticalRepNumber can map to replicate_number. I'm thinking analyticalRepNumber is the best fit, but we need to look at the definitions more closely
How should samplingProtocolVersion be parsed so that we get individual values for protocol_link.url / protocol_link.name, samp_collec_method, micro_biomass_meth, water_cont_soil_meth
startDate and collectDate both cannot map to collection_date. I'm thinking it should just be collectDate. Important to note that it's not a range either
MIXS ENVO triad (env_broad_scale, env_local_scale, env_medium) need to be represented in a special way in the mapping file
illuminaAdapterKit, illuminaIndex1, illuminaIndex2 all map to pcr_primers, but pcr_primers only accepts one has_raw_value, and judging by the definition I think it should be illuminaAdapterKit. But if we're unsure we can always record it as a mapping (other than exact_mapping) and fill in something in the notes for clarification
ncbi_project_name is a great close_mapping ncbiProjectID, amd we shouldn't be dropping ncbiProjectID simply because it isn't an exact_mapping. You can always make an issue to get an ncbi_project_id slot added, which can be a NamedThing. That way it can capture both the id as uriorcurie, and the name as a string. If we decide to do this, we can get rid of the ncbi_project_name slot
@bmeluch @brynnz22 thank you for all the work on the NEON NMDC term mapping file: neon_nmdc_term_mappings.tsv
I was reviewing the TSV file and noticed a few things that need to be corrected before the NEON pipeline that we're writing in the
nmdc-runtime
repo can fully consume it without any ambiguities:CC: @turbomam