poseidon-framework / poseidon-hs

A toolset to work with modular genotype databases in the Poseidon format
https://poseidon-framework.github.io/#/trident
MIT License
7 stars 2 forks source link

Improve the error messages in case of .janno parsing failure #306

Closed nevrome closed 4 days ago

nevrome commented 2 months ago

At the moment error messages are generated by type constructors. This can work well for very specific types (e.g. with the Genetic_Sex column), but is less helpful for generic numeric or string columns.

For example the wrong NA value N/A in a numeric column currently yields:

[Error]   Can't read sample in Poseidon.janno.tsv in line 202: parse error (Failed reading: conversion error: expected Double, got "N/A" (Failed reading: takeWhile1))"

A better version of this message should include the name of the offending column, the Poseidon_ID (if it is not the problem in the first place) and omit the purely technical takeWhile1.

Let's see how far we get with cassava here.

stschiff commented 2 months ago

Yes indeed. Not entirely trivial to get around the parser and do something more custom on the column-level. But worth some investigation.

nevrome commented 2 months ago

This was a recent shower-thinking topic for me. I now believe we should introduce very specific types for each individual .janno column. Probably this would foam up some boilerplate, but if would allow for really neat, clean input validation.

We already have this for a number of columns, where it enables useful and precise error messages:

https://github.com/poseidon-framework/poseidon-hs/blob/c401b90a257ca0722077ed2b6c2a82e77fdd55d6/src/Poseidon/Janno.hs#L363-L371

And as all of this validation happens in the constructor, I'd say we'd stay true to the good old "parse, don't validate".

This solution also avoids hacking cassava.

stschiff commented 1 month ago

Hmm, true, and it does indicate the right column. But does it also indicate a specific row (i.e. a specific individual?). Perhaps that's a compromise we have to live with.

nevrome commented 1 month ago

The row number is already reported. See #307 for a proposal introducing the column-wise types as suggested above.

nevrome commented 4 days ago

I think it's clear that we will merge in #307 (or at least something like it). So I close this issue now.