nih-cfde / cfde-deriva

Collaboration point for miscellaneous CFDE-deriva scripts
Other
2 stars 3 forks source link

Space character in non-string columns leads to ops-error #397

Closed karlcz closed 1 year ago

karlcz commented 1 year ago

For some column types, the frictionless validation does not seem to reject a whitespace value. It is not mapped to null by the missingValues clause of the schema, but also does not form a valid literal of the column type.

This behavior has been observed for these types:

Because sqlite also doesn't care much about column types, these values make it all the way through the portal-prep ETL and eventually lead to a 400 Bad Request error when attempting to send rows to the catalog. These unexpected ermrest errors are mapped to the generic ops-error case in cfde-deriva. They should really produce a meaningful validation error for the end user.

karlcz commented 1 year ago

Upon further investigation, this may be due to an inconsistency between our frictionless schema files, which declare the TSV dialect with "skipInitialSpace": true, and our own TSV reading steps in cfde-deriva where we did not request the same dialect from the Python csv reader module.

karlcz commented 1 year ago

A fix for this was committed to the main branch.