microbiomedata / mixs

Minimum Information about any (X) Sequence” (MIxS) specification
https://microbiomedata.github.io/mixs/
Creative Commons Zero v1.0 Universal
0 stars 0 forks source link

clear up string_serialization and structured_pattern #31

Open turbomam opened 2 years ago

turbomam commented 2 years ago

Early on, I converted MIxS Value syntaxes to LinkML string_serializations. @cmungall also parses the Value syntaxes to detect potential enumerations. There are some hybrid Value syntaxes that get converted into enums with mangled permissible values that are especially problematic when serializing to RDF.

string_serializations have two possible applications:

  1. combining the contents of two separate fields into one, based on the pattern. I believe @sujaypatil96 has implemented that in linkml-convert.
  2. parsing the contents of a field. I don't believe that has been implemented anywhere, although it would be handy for making the conversion of DataHarmonizer output into schema-compliant JSON more declarative.

structured_patterns can be used to assemble complex regular expressions from reusable components, like the {}-wrapped tokens from MIxS Value syntaxes. The complex regular expressions can then be used to validate input into DataHarmonizer.

I think I should switch the instantiation of MIxS Value syntaxes from string_serializations to structured_patterns

turbomam commented 2 years ago

see also #28