microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
27 stars 8 forks source link

mixs as a submodule #291

Closed turbomam closed 2 years ago

turbomam commented 2 years ago

for moving away from the old static-ish MIxS 5 mixs.yaml

MIxS isn't available through w3id yet, and the LinkML YAML files aren't bundled with the PyPI package yet:

pip show -f mixs-linkml

Name: mixs-linkml Version: 0.1.1 Summary: A LinkML (https://linkml.io/) model of the MIxS standard (https://gensc.org/mixs/) Home-page: Author: GSC Author-email: License: CC0 Location: /Users/MAM/mixs_pypi_test/venv/lib/python3.9/site-packages Requires: linkml, mkdocs, pandas Required-by: Files: mixs_linkml-0.1.1.dist-info/INSTALLER mixs_linkml-0.1.1.dist-info/LICENSE mixs_linkml-0.1.1.dist-info/METADATA mixs_linkml-0.1.1.dist-info/RECORD mixs_linkml-0.1.1.dist-info/REQUESTED mixs_linkml-0.1.1.dist-info/WHEEL release/init.py release/pycache/init.cpython-39.pyc release/graphql/mixs.graphql release/jsonld/mixs.context.jsonld release/jsonld/mixs.jsonld release/jsonschema/mixs.schema.json release/mixs.py release/owl/mixs.owl.ttl release/prefixmap/mixs.yaml release/protobuf/mixs.proto release/shacl/mixs.shacl.ttl release/shex/mixs.shex release/sqlschema/mixs.sql

turbomam commented 2 years ago

mixs.yaml slots that are missing in MIxS 6

turbomam commented 2 years ago

But these are the only lost MIxS slots that the current nmdc-schema associates with biosamples: ['env_package', 'tot_nitro_content_meth', 'water_content_soil_meth']

And none of those even appear in the mongodb biosample_set!

See https://github.com/microbiomedata/nmdc-schema/blob/issue-291-mixs-submod/util/reconsititute_mixs.py

https://github.com/GenomicsStandardsConsortium/mixs/issues/84 suggests replacements for two of thsoe

I have requested the re-addition of env_package:

We can add it directly to the nmdc-schema in the meantime: https://github.com/GenomicsStandardsConsortium/mixs/issues/387

turbomam commented 2 years ago

Here's a starting point for discussing a new MIxS 6 dynamic import into the nmdc-schema:

src/schema/mixs_6_for_nmdc.yaml, in the issue-291-mixs-submod branch

It's built by https://github.com/microbiomedata/nmdc-schema/blob/issue-291-mixs-submod/util/reconsititute_mixs.py, which needs refactoring

Filenames, schema names and IDs etc can all be changed.

Is this sufficient for declaring that a slot comes from MIxS?

slots:
  SLOTNAME:
    from_schema: http://w3id.org/mixs/terms
turbomam commented 2 years ago

get rid of empty examples etc.

turbomam commented 2 years ago

compare ranges between old MIxS 5 and new MIxS 6 files, especially looking for things like quantity value

deepdiff?

turbomam commented 2 years ago

funny looking wall_texture_enum PVs


  wall_texture_enum:
    name: wall_texture_enum
    from_schema: http://w3id.org/mixs/terms
    permissible_values:
      crows feet:
        text: crows feet
      crows-foot stomp:
        text: crows-foot stomp
      ? ''
      : text: ''

also

turbomam commented 2 years ago

env_package from MIxS 5-based src/schema/mixs.yaml sure looks like an enum

range: text value
pattern: '[air|built environment|host\-associated|human\-associated|human\-skin|human\-oral|human\-gut|human\-vaginal|hydrocarbon
  resources\-cores|hydrocarbon resources\-fluids\/swabs|microbial mat\/biofilm|misc
  environment|plant\-associated|sediment|soil|wastewater\/sludge|water]'
turbomam commented 2 years ago

nmdc-schema namespaces, prefixes etc are a mess. element URLs are unsolvable.

from_schema for env_package from (MIxS 5-based) src/schema/mixs.yaml: https://microbiomedata/schema/mixs