inbo / whip

✅ Human and machine-readable syntax to express specifications for data
MIT License
7 stars 0 forks source link

Would a "required" spec be useful? #12

Open tucotuco opened 7 years ago

tucotuco commented 7 years ago

This would be a data set level specification. A list of fields that must be found in the input.

Related, what would the implementation expectation be for a specification that can not be validated because the field is not in the input?

peterdesmet commented 7 years ago

We had a required at one point, but you can replicate it with:

sex:

Or for multiple fields:

sex:
lifeStage:
recordedBy:

Listing a field in the specifications will trigger a test for that field. If that field cannot be found, than the implementation will throw an error.

However, empty: False is implied by default for any field (the field cannot be empty). So, if you rather just know if a field is available, without making any statements about its content, you should use:

sex:
  empty: True

Also note that the specifications dictate what will be tested in the data file, not the other way. So if a data file contains fields that are not listed in the specification, that this will not throw any warning or error.

peterdesmet commented 7 years ago

Oh, the required we had was also term1 : required, term2 : required not required: term1, term2 as you suggest. Anyway, I like the syntax as in my previous comment, because it does not break the term: specification pattern.

Also, it allows to start a specification quite gentle:

scientificName:
  empty: True

recordedBy:
  empty: True

(both terms above should be in the file)

And you can extend it easily to a more robust specification:

scientificName:
  allowed: [...]

recordedBy:
  delimitedvalues:
    delimiter: " | "
    regex: ...