poseidon-framework / poseidon-hs

A toolset to work with modular genotype databases in the Poseidon format
https://poseidon-framework.github.io/#/trident
MIT License
7 stars 2 forks source link

more adjustements for jannocoalesce #288

Closed nevrome closed 8 months ago

nevrome commented 8 months ago

I did essentially three things in this PR:

  1. Added an option to exclude columns. I did so with a custom record type, some logic to handle it and a new command line input setup. The user can now either select --includeColumns or --excludeColumns. If they choose neither, then AllJannoColumns applies.
data CoalesceJannoColumnSpec =
      AllJannoColumns
    | IncludeJannoColumns [BSC.ByteString]
    | ExcludeJannoColumns [BSC.ByteString]
  1. Added some helpful logging about the number of fields changed and the number of target-source mismatches. This is particularly useful for large packages. Inspired by our recent discussion I solved this with IORef and not with a monad transformer. But this now also feels a bit verbose - not sure.

  2. Found and fixed a bug in the handling of additional, not specified variables in the source .janno file. They would always get copied, even if --fillColumns (now --includeColumns) was set. I added some tests to cover this behavior.

    My solution was to rewrite the core logic of mergeRow. Instead of calculating a conditional union it now first creates a version of the target with exactly the desired values and then loops through them and replaces the right ones from the source. This solution seems to work and passes all tests. Maybe it's less efficient in some cases.

codecov[bot] commented 8 months ago

Codecov Report

Attention: Patch coverage is 75.55556% with 11 lines in your changes are missing coverage. Please review.

Project coverage is 68.22%. Comparing base (a4b30ba) to head (fb22b43).

Files Patch % Lines
src/Poseidon/CLI/Jannocoalesce.hs 75.55% 2 Missing and 9 partials :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## jannocoalesce #288 +/- ## ================================================= + Coverage 68.16% 68.22% +0.05% ================================================= Files 26 26 Lines 3446 3468 +22 Branches 385 390 +5 ================================================= + Hits 2349 2366 +17 Misses 712 712 - Partials 385 390 +5 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

stschiff commented 8 months ago

Great, thanks, I'll take a look ASAP

nevrome commented 8 months ago

Thanks - merged! I will now add a release-changelog on the jannocoalesce branch and then merge into master to publish there.