Closed nevrome closed 4 days ago
Yes indeed. Not entirely trivial to get around the parser and do something more custom on the column-level. But worth some investigation.
This was a recent shower-thinking topic for me. I now believe we should introduce very specific types for each individual .janno column. Probably this would foam up some boilerplate, but if would allow for really neat, clean input validation.
We already have this for a number of columns, where it enables useful and precise error messages:
And as all of this validation happens in the constructor, I'd say we'd stay true to the good old "parse, don't validate".
This solution also avoids hacking cassava.
Hmm, true, and it does indicate the right column. But does it also indicate a specific row (i.e. a specific individual?). Perhaps that's a compromise we have to live with.
The row number is already reported. See #307 for a proposal introducing the column-wise types as suggested above.
I think it's clear that we will merge in #307 (or at least something like it). So I close this issue now.
At the moment error messages are generated by type constructors. This can work well for very specific types (e.g. with the
Genetic_Sex
column), but is less helpful for generic numeric or string columns.For example the wrong
NA
valueN/A
in a numeric column currently yields:A better version of this message should include the name of the offending column, the Poseidon_ID (if it is not the problem in the first place) and omit the purely technical
takeWhile1
.Let's see how far we get with cassava here.