microbiomedata / submission-schema

https://microbiomedata.github.io/submission-schema/
MIT License
1 stars 1 forks source link

switch packed multivalue string delimiter from `|` to `!` ? #23

Open turbomam opened 1 year ago

turbomam commented 1 year ago

The |s are getting escaped or even double escaped

see https://github.com/microbiomedata/submission-schema/blob/issue-18-water-patterns/examples/output/SampleData-water-data.regen.yaml

would need to change sample data and pattern assignments

turbomam commented 1 year ago

compare these columns from examples/output/SampleData-water-data.tsv

analysis_type alkalinity atmospheric_data
[metagenomics|metatranscriptomics] 50 milligram per liter wind speed;9 knots\|rain;2.3 inches

^([^;\t\r\x0A]+;[-+]?[0-9].?[0-9]+([eE][-+]?[0-9]+)? [^;\t\r\x0A]+|)([^;\t\r\x0A]+;[-+]?[0-9]*.?[0-9]+([eE][-+]?[0-9]+)? [^;\t\r\x0A]+)$

mslarae13 commented 6 months ago

I think this is schema, not submission-schema

If I am following this, we want a consistent delimiter for truly multivalued fields.

I like ;

The hard part is the example you provided for atmospheric_data.

This example has 2 issues. one, it shouldn't be multivalued. two, even if it is allowed to be multivalued as some slots are, the combo ; and | is confusing.

The issue there is ; isn't the multi value / list delimiter.

My vote... make a blanket rule of use a comma , when separating pieces of the same list and ; to separate the slots that allow multiple values/ list.

We need to confirm this with MIxS & brings the question do we enforce it and make these slots "mixs modified" or wait for the GSC to make the change?

@cmungall @sierra-moxon I'd like your thoughts.

mslarae13 commented 6 months ago

See also: https://github.com/microbiomedata/issues/issues/78