microbiomedata / DataHarmonizer

Our dev interface is available via github pages:
https://microbiomedata.github.io/DataHarmonizer/main.html
MIT License
1 stars 0 forks source link

Single, long sheet with all packages x slots in Soil-NMDC-Template_Compiled? #18

Closed turbomam closed 2 months ago

turbomam commented 2 years ago

Soil-NMDC-Template_Compiled

turbomam commented 2 years ago

see also #16

turbomam commented 2 years ago

See also MIxS Terms Skipped sheet

Is this just intended for soil? There are slots (mixs_6_slot_name column) in there that aren't part of the soil package per se. At least they don't show up in the induced slots from the mixs LinkML model.

Columns:

Migrate the value-added columns here into the upcoming packages x slots sheet.

GitHub Ticket should be a pipe-delimited list of issue URLs

turbomam commented 2 years ago

Likewise EXACT MIxS Terms for DH

turbomam commented 2 years ago

The MIxS Terms Skipped sheet says to skip these soil slots

['extreme_salinity', 'link_addit_analys', 'pool_dna_extracts']

and to skip these slots that don't appear in the soil induced slots

['16s_recover', '16s_recover_software', 'adapters', 'annot', 'assembly_name', 'assembly_qual', 'assembly_software', 'associated resource', 'bacteria_carb_prod', 'bin_param', 'bin_software', 'chimera_check', 'compl_appr', 'compl_score', 'compl_software', 'contam_score', 'contam_screen_input', 'contam_screen_param', 'decontam_software', 'detec_type', 'encoded_traits', 'estimated_size', 'extrachrom_elements', 'feat_pred', 'host_disease_stat', 'host_pred_appr', 'host_pred_est_acc', 'host_spec_range', 'isol_growth_condt', 'lib_layout', 'lib_reads_seqd', 'lib_screen', 'lib_size', 'lib_vector', 'mag_cov_software', 'mid', 'neg_cont_type', 'nucl_acid_amp', 'nucl_acid_ext', 'num_replicons', 'number_contig', 'org_carb', 'otu_class_appr', 'otu_db', 'otu_seq_comp_appr', 'pathogenicity', 'pcr_cond', 'pcr_primers', 'ploidy', 'pos_cont_type', 'pred_genome_struc', 'pred_genome_type', 'propagation', 'reassembly_bin', 'ref_biomaterial', 'ref_db', 'samp_taxon_id', 'samp_vol_we_dna_ext', 'seq_meth', 'seq_quality_check', 'sim_search_meth', 'single_cell_lysis_appr', 'single_cell_lysis_prot', 'size_frac', 'sop', 'sort_tech', 'source_uvig', 'specific_host', 'subspecf_gen_lin', 'target_gene', 'target_subfragment', 'tax_class', 'tax_ident', 'trna_ext_software', 'trnas', 'trophic_level', 'vir_ident_software', 'virus_enrich_appr', 'wga_amp_appr', 'wga_amp_kit']

these soil slots are not skipped

['agrochem_addition', 'al_sat', 'al_sat_meth', 'alt', 'annual_precpt', 'annual_temp', 'collection_date', 'crop_rotation', 'cur_land_use', 'cur_vegetation', 'cur_vegetation_meth', 'depth', 'drainage_class', 'elev', 'env_broad_scale', 'env_local_scale', 'env_medium', 'extreme_event', 'fao_class', 'fire', 'flooding', 'geo_loc_name', 'heavy_metals', 'heavy_metals_meth', 'horizon_meth', 'lat_lon', 'link_class_info', 'link_climate_info', 'local_class', 'local_class_meth', 'micro_biomass_meth', 'microbial_biomass', 'misc_param', 'ph', 'ph_meth', 'prev_land_use_meth', 'previous_land_use', 'profile_position', 'salinity_meth', 'season_precpt', 'season_temp', 'sieving', 'slope_aspect', 'slope_gradient', 'soil_horizon', 'soil_text_measure', 'soil_texture_meth', 'soil_type', 'soil_type_meth', 'store_cond', 'temp', 'tillage', 'tot_nitro_cont_meth', 'tot_nitro_content', 'tot_org_c_meth', 'tot_org_carb', 'water_cont_soil_meth', 'water_content']

turbomam commented 2 years ago

EXACT MIxS Terms for DH says to use these soil slots as is

['agrochem_addition', 'al_sat', 'al_sat_meth', 'alt', 'annual_precpt', 'annual_temp', 'crop_rotation', 'cur_land_use', 'cur_vegetation', 'cur_vegetation_meth', 'drainage_class', 'elev', 'env_broad_scale', 'env_local_scale', 'env_medium', 'extreme_event', 'fao_class', 'fire', 'flooding', 'geo_loc_name', 'heavy_metals', 'heavy_metals_meth', 'horizon_meth', 'lat_lon', 'link_class_info', 'link_climate_info', 'local_class', 'local_class_meth', 'micro_biomass_meth', 'microbial_biomass', 'misc_param', 'ph', 'ph_meth', 'prev_land_use_meth', 'previous_land_use', 'profile_position', 'salinity_meth', 'season_precpt', 'season_temp', 'sieving', 'slope_aspect', 'slope_gradient', 'soil_text_measure', 'soil_texture_meth', 'soil_type', 'soil_type_meth', 'tillage', 'tot_nitro_cont_meth', 'tot_nitro_content', 'tot_org_c_meth', 'tot_org_carb', 'water_cont_soil_meth', 'water_content']

it also says to use these slots as is but they don't appear as soil induced slots

They don't appear in sheet OtherPackages either ['experimental_factor', 'org_matter', 'source_mat_id']

these soil slots are not used as-is (if at all?)

['collection_date', 'depth', 'extreme_salinity', 'link_addit_analys', 'pool_dna_extracts', 'soil_horizon', 'store_cond', 'temp']

remember, these are skipped, as per MIxS Terms Skipped: ['extreme_salinity', 'link_addit_analys', 'pool_dna_extracts']

These are not used as-is but not skipped either ['collection_date', 'depth', 'soil_horizon', 'store_cond', 'temp']

turbomam commented 2 years ago

See also MIxS 6 term updates

source_mat_id and experimental_factor do appear as Structured comment names in sheet MIxS6 Core- Final_clean

org_matter does appear in the Structured comment name column of sheet MIxS6 packages - Final_clean, but not in association with soil

turbomam commented 2 years ago

Additional "non-soil" slots in the Terms tab

['air_temp_regm', 'biotic_regm', 'carb_nitro_ratio', 'chem_administration', 'climate_environment', 'experimental_factor', 'gaseous_environment', 'humidity_regm', 'light_regm', 'org_matter', 'org_nitro', 'oxy_stat_samp', 'phosphate', 'salinity', 'samp_store_temp', 'size_frac_low', 'size_frac_up', 'tot_carb', 'tot_phosp', 'watering_regm']

Note experimental_factor and org_matter are in this list but source_mat_id isn't. source_mat_id does appear in column EMSL_slot_Name

turbomam commented 2 years ago

source_mat_id comes from the nucleic acid sequence source Section of MIxS6 Core- Final_clean

experimental_factor comes from investigation

does the mixs-source code only include terms from the environment section and ignore these?

turbomam commented 2 years ago

use schemasheets

pkalita-lbl commented 2 months ago

Google Sheets no longer used in pipeline. It has been replaced by TSV files in the submission-schema repo. If there are issues with those files report them in that repo.