microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
27 stars 8 forks source link

examples in slot `slope gradient` don't validate #1353

Open mslarae13 opened 10 months ago

mslarae13 commented 10 months ago

See here Screenshot 2023-11-14 at 3 54 17 PM

slot slope_gradient has 10% as a valid example but it won't validate with out a .. so 10 %

turbomam commented 10 months ago

good catch. we are inheriting this from an old version of MIxS. I will check v6.2 to see if the examples that violate our expectations are still there.

if they aren't, then I need to get my act together and update the import process to use v6.2, which will require lots of review form tohers

alternatively we could do either or both of these:

mslarae13 commented 10 months ago

Pending your inspection of 'if examples that violate our expectations are still there' ...

If MIxS still has invalid examples I recommend we update the validator to not REQUIRE a 'space' between the # and % .. so 10% WILL validate We put in an issue to MIxS asking for an update to their validator

If MIxS has removed the invalid examples We should pull that in, still make the suggestion to MIxS to allow for 'no space' between # and % but not change it on our end. Wait for MIxS to update.

@turbomam thoughts?

turbomam commented 10 months ago

I forgot to look last week. I'll do it now.

turbomam commented 10 months ago

I wonder where those examples came from? I don't see them in MIxS 6.0-6.2

Transposed rows from MIxS 6.1 "mixs_v6.xlsx"

Environmental package soil agriculture
Structured comment name slope_gradient slope_gradient
Package item slope gradient slope gradient
Definition Commonly called 'slope'. The angle between ground surface and a horizontal line (in percent). This is the direction that overland water would flow. This measure is usually taken with a hand level meter or clinometer Commonly called 'slope'. The angle between ground surface and a horizontal line (in percent). This is the direction that overland water would flow. This measure is usually taken with a hand level meter or clinometer
Expected value measurement value measurement value
Value syntax {float} {unit} {float} {unit}
Example
Requirement X C
Preferred unit percentage percentage
Occurrence 1 1
MIXS ID MIXS:0000646 MIXS:0000646

MIxS v6.2.0 documentation page

linkml source for the slot, v6.2.0:

name: slope_gradient
annotations:
  Preferred_unit:
    tag: Preferred_unit
    value: percentage
description: Commonly called 'slope'. The angle between ground surface and a horizontal
  line (in percent). This is the direction that overland water would flow. This measure
  is usually taken with a hand level meter or clinometer
title: slope gradient
from_schema: https://w3id.org/mixs
keywords:
- slope
slot_uri: MIXS:0000646
alias: slope_gradient
domain_of:
- Agriculture
- Soil
range: string
pattern: ^[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?( *- *[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?)?
  *([^\s-]{1,2}|[^\s-]+.+[^\s-]+)$
structured_pattern:
  syntax: ^{scientific_float}( *- *{scientific_float})? *{text}$
  interpolated: true
  partial_match: true
turbomam commented 10 months ago

Regexr sandbox for the current submission-schema examples vs the MIxS 6.2.0 validation pattern

turbomam commented 10 months ago

If you look at the modifications_long.tsv section in the submission-schema search results for slope_gradient you'll see how we have asserted our own examples and a todo:

Slope is a percent. How does the validation work? Check to correct examples

turbomam commented 10 months ago

submission-schema has a validation pattern of ^[-+]?[0-9]*\.?[0-9]+ +\S.*$ for slope_gradient, but it isn't coming from MIxS 6.2.0 or from NMDC's frozen MIxS v6.0 either. See https://microbiomedata.github.io/mixs/slope_gradient/

As far as I am concerned we can assert whatever examples we want in the Biosample slot usage in nmdc.yaml and whatever validation pattern we want in https://github.com/microbiomedata/submission-schema/blob/main/sheets_and_friends/tsv_in/modifications_long.tsv

@pkalita-lbl and @mslarae13 this is the kind of thing we will have to keep an eye on when we switch to importing MIxS terms directly from https://github.com/GenomicsStandardsConsortium/mixs/blob/v6.2.0/src/mixs/schema/mixs.yaml. See

turbomam commented 10 months ago

@mslarae13 I have emphasize a minimal number of abstracted structured_patterns in MIxS v6.2.0. If you want to suggest changing GSC's pattern for slope_gradient, let's to look for the other slots that are using the same pattern and be consistent.

mslarae13 commented 10 months ago

Thanks @turbomam forgot we added those! Regardless, I think it's the same solution. It's weird to allow "10 %" and not "10%".