poseidon-framework / poseidon-hs

A toolset to work with modular genotype databases in the Poseidon format
https://poseidon-framework.github.io/#/trident
MIT License
7 stars 2 forks source link

Make `[JannoRow]` an instance of `Monoid` #229

Closed nevrome closed 1 year ago

nevrome commented 1 year ago

A janno file is currently defined as [JannoRow]. To combine two janno files we have a dedicated function

combineTwoJannos :: [JannoRow] -> [JannoRow] -> [JannoRow]

This is necessary to keep proper track of arbitrary, additional columns. We also have a dedicated function to concatenate janno files, which makes use of combineTwoJannos.

concatJannos :: [[JannoRow]] -> [JannoRow]
concatJannos = foldl' combineTwoJannos []

Is is easy to forget to use these functions, though, because Data.List.concate will of course also typecheck. I almost ran into this issue yesterday.

Can we prevent this by making [JannoRow] an instance of Monoid? We should probably give it a proper name in the process, maybe JannoFile.

I think we only have to implement <>/mappend for Semigroup (which is essentially combineTwoJannos) and mempty which should be []. We could also overwrite the default mconcat (foldr mappend mempty) with the potentially more efficient foldl' mappend mempty, which is what we currently have in concatJannos.

I feel like we already discussed sth. like this in the past, @stschiff. I hope I didn't forget a crucial drawback of this idea.