Closed karlcz closed 1 year ago
Upon further investigation, this may be due to an inconsistency between our frictionless schema files, which declare the TSV dialect with "skipInitialSpace": true
, and our own TSV reading steps in cfde-deriva where we did not request the same dialect from the Python csv reader module.
A fix for this was committed to the main branch.
For some column types, the frictionless validation does not seem to reject a whitespace value. It is not mapped to null by the missingValues clause of the schema, but also does not form a valid literal of the column type.
This behavior has been observed for these types:
Because sqlite also doesn't care much about column types, these values make it all the way through the portal-prep ETL and eventually lead to a 400 Bad Request error when attempting to send rows to the catalog. These unexpected ermrest errors are mapped to the generic ops-error case in cfde-deriva. They should really produce a meaningful validation error for the end user.