turbomam / mixs-subset-examples-first

A subset of the MIxS specification that's self-documenting and DataHarmonizer compatible. Comes with valid and invalid data examples. Subset = all checklists and all environmental packages, but partial combinations.
https://turbomam.github.io/mixs-subset-examples-first/
MIT License
0 stars 0 forks source link

Add env package requirements sheet. Include usage and or annotations? #47

Open turbomam opened 1 year ago

turbomam commented 1 year ago

would be based on data/mixs_v6_environmental_packages.tsv

Environmental package Structured comment name Package item Definition Expected value Value syntax Example Requirement Preferred unit Occurrence MIXS ID
air samp_name sample name A local identifier or name that for the material sample used for extracting nucleic acids... text {text} ISDsoil1 M 1 MIXS:0001107

That doesn't take advantage of previous Environmental package and Structured comment name cleanups that went into XXX?

How to look for malformed names in YAML after the fact?

Must include

Requirement Count - Requirement
C 163
E 7
M 191
X 1390
(empty)
Total Result 1751
turbomam commented 1 year ago

will need to change it's Environmental package names into PascalCased class names. Could look up in data/mixs_v6_checklists_env_packages_classes_curated.tsv

class title aliases class_uri description in_subset is_a mixin mixins
> class title aliases class_uri description in_subset is_a mixin mixins
MigsEu migs_eu MIXS:0010002 Checklist TRUE
turbomam commented 1 year ago

will need to convert MIxS Requirement codes to LinkML recommended and required values. Should use data/mixs_requirement_codes.tsv

mixs_citation = https://github.com/GenomicsStandardsConsortium/mixs/wiki/5.-MIxS-checklists

mixs_requirement_value mixs_name mixs_desc not applicable optional recommended required
- not applicable descriptor is not applicable for a given checklist type TRUE
C conditional mandatory descriptor must be present for compliance with the checklist, but only when applicable to the study, i.e. if this item is not applicable for the study the metadata data will still be checklist compliant even if it is left out TRUE
E Environment-dependent descriptor must be present depending on the environment the original sample was obtained from TRUE
M mandatory descriptor must be present for compliance with the checklist TRUE
X optional descriptor may be present, not mandatory for compliance with checklist TRUE
turbomam commented 1 year ago

use mixs_subset_examples_first/datamodel/merge_tsvs.py for merging?

oops, what if the on keys aren't the same?

and some of these files have two schemasheets headers

poetry run python src/mixs_subset_examples_first/datamodel/merge_tsvs.py \
    --file1 XXX \
    --file2 XXX \
    --on mixs_requirement_value \
    --output XXX