Closed nevrome closed 4 days ago
Attention: Patch coverage is 53.89610%
with 213 lines
in your changes missing coverage. Please review.
Project coverage is 59.56%. Comparing base (
5306ba0
) to head (5406a3b
). Report is 35 commits behind head on master.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This must feel like a nightmare to review, but I think it looks worse than it is. I only worked in the Janno
and the new ColumnTypes(Utils)
modules. The other adjustments are just fallout from that.
Open questions on my end are:
Janno
, but it also clearly reduces the amount of code. And we already used template haskell, so this is not a new dependency.Poseidon_ID
, because it would be a pain to realize this change across the code base. The one for Group_Name
is already questionable, but I wanted the new, reliable UTF-8 validation for the group names.OK, thanks. I think I'm not too sceptical from the outset. Let me have a look.
I think I would like to put reviewing this on short hold until we've made a decision about #309. Whether we want to remove the AESON instances is also worth a short discussion. We can do so, and at this point the whole template-haskell bit might become a bit obsolete, because there aren't that many type classes then anymore to automatically fill. But happy to discuss, we can also just do it the way you've done now.
I will be on vacation next week, and will pick this up again on Oct 21st.
And I just wanted to say that from a user-perspective this PR is a huge improvement. I have already made use of it when I prepared a recent Janno file and used this version to point me to errors!
Yes - I found myself also immediately switching to this branch for all practical work with the data. It makes spotting errors easier.
Whether we want to remove the AESON instances is also worth a short discussion.
What do you mean? We need the FromJSON
and ToJSON
instances for the server-client interaction here:
https://github.com/poseidon-framework/poseidon-hs/blob/5306ba0a07bbb1bfae41225a834219a2687b65a3/src/Poseidon/ServerClient.hs#L77-L112
This code does not compile if we remove a single instance definition for any of the janno types. Or is there an easy way to refactor this into a Aeson-free solution?
Hmm, interesting. I think the constructor ApiReturnJanno
must have been added pre-emptively. It is not actually used anywhere. Ever since I have added the option to transfer any additional Janno-columns as untyped Strings together with the Individual-Info, we dropped the idea of adding a dedicated Janno-Return API I think. But good catch!
I have just checked: If I comment out the lines with these constructors in it, the code compiles just fine. So these were really added with no client actually using it.
I don't know whether we should go that step and remove all these Aeson-instances. They are definitely not used right now, and our JSON transfer just uses the Cassava Csv.ToField
instances to serialise fields.
I guess we can remove them and then also remove this unused feature from the ServerClient.
When working through all these PRs in the community archive I realized that the .janno parsing certainly needs more informative error messages. In this PR I would like to propose a better and more transparent solution.
Here's a summary of what I did:
Janno...
types for every janno column (exceptPoseidon_ID
) in a new moduleColumnTypes
.Show
,ToField
,FromField
(cassava),ToJSON
,FromJSON
(Aeson) and two custom typeclassesHasColName
andMakeable
.Makeable
has a functionmake :: MonadFail m => T.Text -> m a
that can be used for very precises input validation.ColumnTypesUtils
together with a bit template haskell code (makeInstances
) to reduce the amount of boilerplate for simple text columns.Text
for the string types and as an intermediate format, so that we can reliably check for non UTF-8 characters withT.decodeUtf8'
.Bytestring
to UTF-8 encodedText
andfail
upon exceptions. Then transformText
to the desired type with amake...
constructor function. This function does additional validation andfail
s if the checks can not be satisfied.I did not introduce additional validation here, but this new setup makes it very easy to do so by modifying the
make
function of a given type.make
only lives inMonadFail
, so additional checks that should only yield a warning must be done incheckJannoRowConsistency
(as in the past).I did not adjust the type for the .ssf file (
SeqSourceRow
) yet. It's not so urgent, imho. But my ideas was that its columns could also be added inColumnTypes
eventually.Now what do we get out of this change?
To demonstrate this I took the
2012_MeyerScience.janno
file and broke some of it's columns:Relation_Note
column of line 2.;
inCoverage_on_Target_SNPs
of line 3.x
to the Latitude column of line 7.Here's what trident returns without this patch:
And now with the changes here:
Most importantly the error messages now include the relevant column name. They are also more concrete and more easy to understand.