microbiomedata / DataHarmonizer

Our dev interface is available via github pages:
MIT License
1 stars 0 forks source link

Single, long sheet with all packages x slots in Soil-NMDC-Template_Compiled? #18

Closed turbomam closed 2 months ago

turbomam commented 2 years ago


turbomam commented 2 years ago

see also #16

turbomam commented 2 years ago

See also MIxS Terms Skipped sheet

Is this just intended for soil? There are slots (mixs_6_slot_name column) in there that aren't part of the soil package per se. At least they don't show up in the induced slots from the mixs LinkML model.


Migrate the value-added columns here into the upcoming packages x slots sheet.

GitHub Ticket should be a pipe-delimited list of issue URLs

turbomam commented 2 years ago

Likewise EXACT MIxS Terms for DH

turbomam commented 2 years ago

The MIxS Terms Skipped sheet says to skip these soil slots

['extreme_salinity', 'link_addit_analys', 'pool_dna_extracts']

and to skip these slots that don't appear in the soil induced slots

['16s_recover', '16s_recover_software', 'adapters', 'annot', 'assembly_name', 'assembly_qual', 'assembly_software', 'associated resource', 'bacteria_carb_prod', 'bin_param', 'bin_software', 'chimera_check', 'compl_appr', 'compl_score', 'compl_software', 'contam_score', 'contam_screen_input', 'contam_screen_param', 'decontam_software', 'detec_type', 'encoded_traits', 'estimated_size', 'extrachrom_elements', 'feat_pred', 'host_disease_stat', 'host_pred_appr', 'host_pred_est_acc', 'host_spec_range', 'isol_growth_condt', 'lib_layout', 'lib_reads_seqd', 'lib_screen', 'lib_size', 'lib_vector', 'mag_cov_software', 'mid', 'neg_cont_type', 'nucl_acid_amp', 'nucl_acid_ext', 'num_replicons', 'number_contig', 'org_carb', 'otu_class_appr', 'otu_db', 'otu_seq_comp_appr', 'pathogenicity', 'pcr_cond', 'pcr_primers', 'ploidy', 'pos_cont_type', 'pred_genome_struc', 'pred_genome_type', 'propagation', 'reassembly_bin', 'ref_biomaterial', 'ref_db', 'samp_taxon_id', 'samp_vol_we_dna_ext', 'seq_meth', 'seq_quality_check', 'sim_search_meth', 'single_cell_lysis_appr', 'single_cell_lysis_prot', 'size_frac', 'sop', 'sort_tech', 'source_uvig', 'specific_host', 'subspecf_gen_lin', 'target_gene', 'target_subfragment', 'tax_class', 'tax_ident', 'trna_ext_software', 'trnas', 'trophic_level', 'vir_ident_software', 'virus_enrich_appr', 'wga_amp_appr', 'wga_amp_kit']

these soil slots are not skipped

['agrochem_addition', 'al_sat', 'al_sat_meth', 'alt', 'annual_precpt', 'annual_temp', 'collection_date', 'crop_rotation', 'cur_land_use', 'cur_vegetation', 'cur_vegetation_meth', 'depth', 'drainage_class', 'elev', 'env_broad_scale', 'env_local_scale', 'env_medium', 'extreme_event', 'fao_class', 'fire', 'flooding', 'geo_loc_name', 'heavy_metals', 'heavy_metals_meth', 'horizon_meth', 'lat_lon', 'link_class_info', 'link_climate_info', 'local_class', 'local_class_meth', 'micro_biomass_meth', 'microbial_biomass', 'misc_param', 'ph', 'ph_meth', 'prev_land_use_meth', 'previous_land_use', 'profile_position', 'salinity_meth', 'season_precpt', 'season_temp', 'sieving', 'slope_aspect', 'slope_gradient', 'soil_horizon', 'soil_text_measure', 'soil_texture_meth', 'soil_type', 'soil_type_meth', 'store_cond', 'temp', 'tillage', 'tot_nitro_cont_meth', 'tot_nitro_content', 'tot_org_c_meth', 'tot_org_carb', 'water_cont_soil_meth', 'water_content']

turbomam commented 2 years ago

EXACT MIxS Terms for DH says to use these soil slots as is

['agrochem_addition', 'al_sat', 'al_sat_meth', 'alt', 'annual_precpt', 'annual_temp', 'crop_rotation', 'cur_land_use', 'cur_vegetation', 'cur_vegetation_meth', 'drainage_class', 'elev', 'env_broad_scale', 'env_local_scale', 'env_medium', 'extreme_event', 'fao_class', 'fire', 'flooding', 'geo_loc_name', 'heavy_metals', 'heavy_metals_meth', 'horizon_meth', 'lat_lon', 'link_class_info', 'link_climate_info', 'local_class', 'local_class_meth', 'micro_biomass_meth', 'microbial_biomass', 'misc_param', 'ph', 'ph_meth', 'prev_land_use_meth', 'previous_land_use', 'profile_position', 'salinity_meth', 'season_precpt', 'season_temp', 'sieving', 'slope_aspect', 'slope_gradient', 'soil_text_measure', 'soil_texture_meth', 'soil_type', 'soil_type_meth', 'tillage', 'tot_nitro_cont_meth', 'tot_nitro_content', 'tot_org_c_meth', 'tot_org_carb', 'water_cont_soil_meth', 'water_content']

it also says to use these slots as is but they don't appear as soil induced slots

They don't appear in sheet OtherPackages either ['experimental_factor', 'org_matter', 'source_mat_id']

these soil slots are not used as-is (if at all?)

['collection_date', 'depth', 'extreme_salinity', 'link_addit_analys', 'pool_dna_extracts', 'soil_horizon', 'store_cond', 'temp']

remember, these are skipped, as per MIxS Terms Skipped: ['extreme_salinity', 'link_addit_analys', 'pool_dna_extracts']

These are not used as-is but not skipped either ['collection_date', 'depth', 'soil_horizon', 'store_cond', 'temp']

turbomam commented 2 years ago

See also MIxS 6 term updates

source_mat_id and experimental_factor do appear as Structured comment names in sheet MIxS6 Core- Final_clean

org_matter does appear in the Structured comment name column of sheet MIxS6 packages - Final_clean, but not in association with soil

turbomam commented 2 years ago

Additional "non-soil" slots in the Terms tab

['air_temp_regm', 'biotic_regm', 'carb_nitro_ratio', 'chem_administration', 'climate_environment', 'experimental_factor', 'gaseous_environment', 'humidity_regm', 'light_regm', 'org_matter', 'org_nitro', 'oxy_stat_samp', 'phosphate', 'salinity', 'samp_store_temp', 'size_frac_low', 'size_frac_up', 'tot_carb', 'tot_phosp', 'watering_regm']

Note experimental_factor and org_matter are in this list but source_mat_id isn't. source_mat_id does appear in column EMSL_slot_Name

turbomam commented 2 years ago

source_mat_id comes from the nucleic acid sequence source Section of MIxS6 Core- Final_clean

experimental_factor comes from investigation

does the mixs-source code only include terms from the environment section and ignore these?

turbomam commented 2 years ago

use schemasheets

pkalita-lbl commented 2 months ago

Google Sheets no longer used in pipeline. It has been replaced by TSV files in the submission-schema repo. If there are issues with those files report them in that repo.