nih-cfde / cfde-deriva

Collaboration point for miscellaneous CFDE-deriva scripts
Other
2 stars 3 forks source link

Frictionless validation too strict for "uri" columns #137

Closed karlcz closed 3 years ago

karlcz commented 3 years ago

We use the column type "string" format "uri" for persistent_id and dcc_url columns. However, it seems frictionless-py applies an overly strict validation rule for this which really only accepts HTTP(S) or FTP URLs. This means it will reject many other URIs such as doi: or tag: or minid:

We need to find a way to disable (or replace) this validation rule or perhaps change our column definitions in the ingest/portal model to avoid this problematic type.

An upstream issue has been submitted to see if they consider this a bug in the package validator.

karlcz commented 3 years ago

We've decided to remove the "format": "uri" constraints from our frictionless models. We suspect that we'll need different, CFDE-specific validation rules in the future, even if they do fix the bug and actually apply spec-compliant URI validation. I will open a separate issue for future CFDE work on this front.