microbiomedata / nmdc-metadata

Managing metadata and policy around metadata in NMDC
2 stars 0 forks source link

Create minimal metadata field list and map to MIxS #36

Open cmungall opened 4 years ago

cmungall commented 4 years ago

We want to take a subset of metadata fields and prioritize these for display in the pilot. This will be a union of

@wdduncan made a start:


This spreadsheet is the ground truth for this task.

@wdduncan will add the display names from mixs, we will use these by default

Reddy and @jagadishcs will take a pass at this. Then we will have Emiley and aim 3 and aim 4 teams look over

wdduncan commented 4 years ago

Human readable labels have been added in mixs_label column for the soil package. Similar mappings to done for other packages. The MIxS spreadsheet is available here:


There are still issues that need to be resolved:

  1. Some GOLD fields cannot be mapped to MIxS terms, for example the GOLD field habitat does not map to MIxS. Terms have been marked in orange.

  2. Some MIxS terms do not map to GOLD fields, for example the MIxS term extreme salinity does not map to a GOLD field. This term has been marked in purple.

  3. Some GOLD terms are ambiguous. See terms marked in yellow.

If the GOLD to MIxS term mapping is accurate, place a 1 in the is_map field. If it is not a mapping, you can leave the is_map field blank. I also added a notes column.

wdduncan commented 4 years ago

@jagadishcs In the google sheet, you marked heavy_metals as "1" even thought the definition was wrong. Is it still correct?

Other questions/issues:

  1. I unmarked the is_map for the annual_season_temp/precpt fields. We need to come up with a resolution plan.
  2. You marked "1" for the salinity_meth field (row 64), but the label reads "extreme_unusual_properties/salinity method". Does this still sound correct to you? The definition seems right.
  3. Also, any update on the missing fields?
jagadishcs commented 4 years ago


wdduncan commented 4 years ago

MIxS v5 package specific spreadsheets uploaded to google drive folder https://drive.google.com/drive/u/1/folders/13DVA0GGoypGIwzCTdKWS4eNh3Y8tjX3y:

jagadishcs commented 4 years ago

@wdduncan @cmungall

Please note that the link given above for Water package is pointing to Sediment package.

For water package, please refer this one: https://docs.google.com/spreadsheets/d/1xbSmHbzZWTUCXY4SCe3zZ6_YFxmuW4ht5s5_3jdgyyQ/edit#gid=994826157

wdduncan commented 4 years ago

Sorry about the typo. All the packages are in this folder: https://drive.google.com/drive/u/1/folders/13DVA0GGoypGIwzCTdKWS4eNh3Y8tjX3y

jagadishcs commented 4 years ago

@cmungall @wdduncan

Please note that I revisited the soil biosample mapping using the latest reference file (MIxSsoil_20180621) you shared.

You can access the revised the file from here: https://docs.google.com/spreadsheets/d/1mFOlEzDCaMn2AwBcJiJMpmj6QXDmb3jUs-o3-2SesaI/edit#gid=891466449

Thank you Chris and Bill. Jagadish

jagadishcs commented 4 years ago

@cmungall @wdduncan

There are two descriptors in MIxS packages (17),  specific_host and host_spec_range, referring to the NCBI taxid.

Since, 'specific_host' value syntax is 'text' and 'taxid', I have taken it for Host-name while mapping and preparing metadata files for PIs. Please let me know if this is OK.

You can please check them to update the definition in the MIxS, if needed.


jagadishcs commented 4 years ago

@cmungall @wdduncan

I have prepared and shared a document describing MIxS packages; you can please access it from here: https://docs.google.com/document/d/141BWGbWdTuCQ_QoqdsO_BvHW37wJuLU9xZnvnHTEtNU/edit


wdduncan commented 4 years ago

@jagadishcs I'm not sure what you mean by 'Host-name'. Do you mean the scientific name?

wdduncan commented 4 years ago

I placed the out my script looking for terms shared across MIxS packages in the google drive. The files are:

  1. mixs-package-term: this lists each term and the packages that contain the term https://drive.google.com/drive/u/1/folders/1frzGlz8EB8inpVokNTSwD6Ia94eVUlsZ

  2. multi-package-mixs-terms-only: same as #1, but only lists terms that appear in multiple packages https://docs.google.com/spreadsheets/d/1vnw-YTX60Sf5qLbESrKRKYIho53YMvfK3sNJv-9XJAs/edit#gid=934350375

cc @cmungall @jagadishcs

jagadishcs commented 4 years ago

@cmungall @wdduncan

I have completed the mapping of 17 MIxS environmental packages with GOLD fields that can be accessed from here: https://drive.google.com/drive/u/0/folders/18941r1aZelhNFoaNakPvMj6K9JZgMcWo


jagadishcs commented 4 years ago

@cmungall @wdduncan

Mapping of MIxS environmental packages with GOLD fields can be accessed from here:

MIxSair_20180621_GOLD_Mapping_04132020.xlsx MIxSbuiltenv_20180621_GOLD_Mapping_04172020.xlsx MIxShostassoc_20180621_GOLD_Mapping.xlsx MIxShumanassoc_20180621_GOLD_Mapping_04142020.xlsx MIxShumangut_20180621_GOLD_Mapping_04152020.xlsx MIxShumanoral_20180621_GOLD_Mapping_04152020.xlsx MIxShumanskin_20180621_GOLD_Mapping_04162020.xlsx MIxShumanvaginal_20180621_GOLD_04162020.xlsx MIxShydrocarbCores_20180621_Mapping_GOLD.xlsx MIxShydrocarbfs_20180621_v5_GOLD_Mapping.xlsx MIxSmatbiofilm_20180621_v5_GOLD_Mapping.xlsx MIxSmisc_20180621_GOLD_Mapping.xlsx MIxSplantassoc_20180621_v5_GOLD_Mapping.xlsx MIxSsediment_20180621_GOLD_Mapping_04102020.xlsx MIxSsoil_20180621_GOLD_Mapping_04102020.xlsx MIxSwastesludge_20180621_GOLD_Mapping_04102020.xlsx MIxSwater_20180621_GOLD_Mapping_04122020.xlsx

wdduncan commented 4 years ago

@jagadishcs can you convert these to google sheets? That way people can edit them in a web browser.

wdduncan commented 4 years ago

Does that have to be on both the package level and on MIxS as a whole?
Will NMDC min set be different that MIxS mandatory fields?

Use MIxS mandatory fields as a seed. Then edit from there.

jagadishcs commented 4 years ago

@cmungall @wdduncan @dehays

You can access the MIxS mandatory descriptors from here that can be used to decide minimal metadata fields. https://docs.google.com/spreadsheets/d/1trmfp9UMsctXWio6H-17GZXSw_Qi10k777I9trzW3ds/edit#gid=221947830

cmungall commented 4 years ago

We should have a csv in the repo consisting of the 6 mixs minimal fields and link from README

wdduncan commented 3 years ago

@dehays Do you think we are ready to close this?