Open cmungall opened 4 years ago
Human readable labels have been added in mixs_label column for the soil package. Similar mappings to done for other packages. The MIxS spreadsheet is available here:
There are still issues that need to be resolved:
Some GOLD fields cannot be mapped to MIxS terms, for example the GOLD field habitat
does not map to MIxS. Terms have been marked in orange.
Some MIxS terms do not map to GOLD fields, for example the MIxS term extreme salinity
does not map to a GOLD field. This term has been marked in purple.
Some GOLD terms are ambiguous. See terms marked in yellow.
If the GOLD to MIxS term mapping is accurate, place a 1 in the is_map
field. If it is not a mapping, you can leave the is_map
field blank. I also added a notes
column.
@jagadishcs In the google sheet, you marked heavy_metals
as "1" even thought the definition was wrong. Is it still correct?
Other questions/issues:
is_map
for the annual_season_temp/precpt
fields. We need to come up with a resolution plan.salinity_meth
field (row 64), but the label reads "extreme_unusual_properties/salinity method". Does this still sound correct to you? The definition seems right.MIxS v5 package specific spreadsheets uploaded to google drive folder https://drive.google.com/drive/u/1/folders/13DVA0GGoypGIwzCTdKWS4eNh3Y8tjX3y:
@wdduncan @cmungall
Please note that the link given above for Water package is pointing to Sediment package.
For water package, please refer this one: https://docs.google.com/spreadsheets/d/1xbSmHbzZWTUCXY4SCe3zZ6_YFxmuW4ht5s5_3jdgyyQ/edit#gid=994826157
Sorry about the typo. All the packages are in this folder: https://drive.google.com/drive/u/1/folders/13DVA0GGoypGIwzCTdKWS4eNh3Y8tjX3y
@cmungall @wdduncan
Please note that I revisited the soil biosample mapping using the latest reference file (MIxSsoil_20180621) you shared.
You can access the revised the file from here: https://docs.google.com/spreadsheets/d/1mFOlEzDCaMn2AwBcJiJMpmj6QXDmb3jUs-o3-2SesaI/edit#gid=891466449
Thank you Chris and Bill. Jagadish
@cmungall @wdduncan
There are two descriptors in MIxS packages (17), specific_host and host_spec_range, referring to the NCBI taxid.
Since, 'specific_host' value syntax is 'text' and 'taxid', I have taken it for Host-name while mapping and preparing metadata files for PIs. Please let me know if this is OK.
You can please check them to update the definition in the MIxS, if needed.
Best
@cmungall @wdduncan
I have prepared and shared a document describing MIxS packages; you can please access it from here: https://docs.google.com/document/d/141BWGbWdTuCQ_QoqdsO_BvHW37wJuLU9xZnvnHTEtNU/edit
Best
@jagadishcs I'm not sure what you mean by 'Host-name'. Do you mean the scientific name?
I placed the out my script looking for terms shared across MIxS packages in the google drive. The files are:
mixs-package-term: this lists each term and the packages that contain the term https://drive.google.com/drive/u/1/folders/1frzGlz8EB8inpVokNTSwD6Ia94eVUlsZ
multi-package-mixs-terms-only: same as #1, but only lists terms that appear in multiple packages https://docs.google.com/spreadsheets/d/1vnw-YTX60Sf5qLbESrKRKYIho53YMvfK3sNJv-9XJAs/edit#gid=934350375
cc @cmungall @jagadishcs
@cmungall @wdduncan
I have completed the mapping of 17 MIxS environmental packages with GOLD fields that can be accessed from here: https://drive.google.com/drive/u/0/folders/18941r1aZelhNFoaNakPvMj6K9JZgMcWo
Best
@cmungall @wdduncan
Mapping of MIxS environmental packages with GOLD fields can be accessed from here:
MIxSair_20180621_GOLD_Mapping_04132020.xlsx MIxSbuiltenv_20180621_GOLD_Mapping_04172020.xlsx MIxShostassoc_20180621_GOLD_Mapping.xlsx MIxShumanassoc_20180621_GOLD_Mapping_04142020.xlsx MIxShumangut_20180621_GOLD_Mapping_04152020.xlsx MIxShumanoral_20180621_GOLD_Mapping_04152020.xlsx MIxShumanskin_20180621_GOLD_Mapping_04162020.xlsx MIxShumanvaginal_20180621_GOLD_04162020.xlsx MIxShydrocarbCores_20180621_Mapping_GOLD.xlsx MIxShydrocarbfs_20180621_v5_GOLD_Mapping.xlsx MIxSmatbiofilm_20180621_v5_GOLD_Mapping.xlsx MIxSmisc_20180621_GOLD_Mapping.xlsx MIxSplantassoc_20180621_v5_GOLD_Mapping.xlsx MIxSsediment_20180621_GOLD_Mapping_04102020.xlsx MIxSsoil_20180621_GOLD_Mapping_04102020.xlsx MIxSwastesludge_20180621_GOLD_Mapping_04102020.xlsx MIxSwater_20180621_GOLD_Mapping_04122020.xlsx
@jagadishcs can you convert these to google sheets? That way people can edit them in a web browser.
Does that have to be on both the package level and on MIxS as a whole?
Will NMDC min set be different that MIxS mandatory fields?
Use MIxS mandatory fields as a seed. Then edit from there.
@cmungall @wdduncan @dehays
You can access the MIxS mandatory descriptors from here that can be used to decide minimal metadata fields. https://docs.google.com/spreadsheets/d/1trmfp9UMsctXWio6H-17GZXSw_Qi10k777I9trzW3ds/edit#gid=221947830
We should have a csv in the repo consisting of the 6 mixs minimal fields and link from README
@dehays Do you think we are ready to close this?
We want to take a subset of metadata fields and prioritize these for display in the pilot. This will be a union of
@wdduncan made a start:
https://docs.google.com/spreadsheets/u/1/d/1mk9VNf9fWsczA6ZMi627CPSOMONuhVYADLY8qP8cx6o/edit#gid=0
This spreadsheet is the ground truth for this task.
@wdduncan will add the display names from mixs, we will use these by default
Reddy and @jagadishcs will take a pass at this. Then we will have Emiley and aim 3 and aim 4 teams look over