poseidon-framework / poseidon-hs

A toolset to work with modular genotype databases in the Poseidon format
https://poseidon-framework.github.io/#/trident
MIT License
7 stars 2 forks source link

Allow to distinguish packages by version in forgeScript #264

Closed nevrome closed 1 year ago

nevrome commented 1 year ago

If this gets merged it will close #260.

This feature requires changes to the convoluted entity data types. I decided to use the opportunity to refactor this code to facilitate this and future changes. So as a side effect this PR will also close #252 if merged.

nevrome commented 1 year ago

As of now this PR is work-in-progress. Some core changes have been applied, but the new logic is not complete yet. The golden tests fail. No new tests for the new functionality have been added yet.

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 77.07% and project coverage change: +0.51% :tada:

Comparison is base (0570251) 70.78% compared to head (db6c476) 71.30%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #264 +/- ## ========================================== + Coverage 70.78% 71.30% +0.51% ========================================== Files 23 26 +3 Lines 3207 3244 +37 Branches 350 359 +9 ========================================== + Hits 2270 2313 +43 + Misses 587 572 -15 - Partials 350 359 +9 ``` | [Files Changed](https://app.codecov.io/gh/poseidon-framework/poseidon-hs/pull/264?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=poseidon-framework) | Coverage Δ | | |---|---|---| | [src/Poseidon/CLI/Rectify.hs](https://app.codecov.io/gh/poseidon-framework/poseidon-hs/pull/264?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=poseidon-framework#diff-c3JjL1Bvc2VpZG9uL0NMSS9SZWN0aWZ5Lmhz) | `60.93% <0.00%> (ø)` | | | [src/Poseidon/Chronicle.hs](https://app.codecov.io/gh/poseidon-framework/poseidon-hs/pull/264?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=poseidon-framework#diff-c3JjL1Bvc2VpZG9uL0Nocm9uaWNsZS5ocw==) | `83.47% <ø> (ø)` | | | [src/Poseidon/CLI/Fetch.hs](https://app.codecov.io/gh/poseidon-framework/poseidon-hs/pull/264?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=poseidon-framework#diff-c3JjL1Bvc2VpZG9uL0NMSS9GZXRjaC5ocw==) | `49.58% <47.36%> (-0.81%)` | :arrow_down: | | [src/Poseidon/ServerClient.hs](https://app.codecov.io/gh/poseidon-framework/poseidon-hs/pull/264?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=poseidon-framework#diff-c3JjL1Bvc2VpZG9uL1NlcnZlckNsaWVudC5ocw==) | `65.38% <65.38%> (ø)` | | | [src/Poseidon/EntityTypes.hs](https://app.codecov.io/gh/poseidon-framework/poseidon-hs/pull/264?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=poseidon-framework#diff-c3JjL1Bvc2VpZG9uL0VudGl0eVR5cGVzLmhz) | `72.09% <72.09%> (ø)` | | | [src/Poseidon/Contributor.hs](https://app.codecov.io/gh/poseidon-framework/poseidon-hs/pull/264?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=poseidon-framework#diff-c3JjL1Bvc2VpZG9uL0NvbnRyaWJ1dG9yLmhz) | `74.50% <74.50%> (ø)` | | | [src/Poseidon/Version.hs](https://app.codecov.io/gh/poseidon-framework/poseidon-hs/pull/264?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=poseidon-framework#diff-c3JjL1Bvc2VpZG9uL1ZlcnNpb24uaHM=) | `93.33% <93.33%> (ø)` | | | [src/Poseidon/EntitiesList.hs](https://app.codecov.io/gh/poseidon-framework/poseidon-hs/pull/264?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=poseidon-framework#diff-c3JjL1Bvc2VpZG9uL0VudGl0aWVzTGlzdC5ocw==) | `92.66% <94.11%> (+3.06%)` | :arrow_up: | | [src/Poseidon/CLI/Forge.hs](https://app.codecov.io/gh/poseidon-framework/poseidon-hs/pull/264?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=poseidon-framework#diff-c3JjL1Bvc2VpZG9uL0NMSS9Gb3JnZS5ocw==) | `68.27% <100.00%> (+0.22%)` | :arrow_up: | | [src/Poseidon/CLI/List.hs](https://app.codecov.io/gh/poseidon-framework/poseidon-hs/pull/264?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=poseidon-framework#diff-c3JjL1Bvc2VpZG9uL0NMSS9MaXN0Lmhz) | `73.84% <100.00%> (ø)` | | | ... and [3 more](https://app.codecov.io/gh/poseidon-framework/poseidon-hs/pull/264?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=poseidon-framework) | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

nevrome commented 1 year ago

I think this could work now. The logic could already be complete, but it is hard to reason about it. I haven't written any new tests, but the old ones are running through unchanged. That is a good sign, because e.g. forge is fairly well covered. I will write new tests for the new interface next week.

Maybe writing tests is also the most feasible review strategy here, @stschiff.

I also realized that package names can now not have - any more to allow a package+version syntax that uses them as a separator (2010_RasmussenNature-2.1.1). This is unfortunate, because - is not an unreasonable symbol for a directory name. I wonder if we should change that as long as we still can, for example to ~.

stschiff commented 1 year ago

OK, great progress, thanks. I suggest to keep the minus. And I think it is not necessary to forbid - in package titles. The parser should be able to distinguish a minus in the title from the separator. It will make the parser more complicated, but conceptually this is well defined. I will think about it and look at the code here.

I agree that adding a few more tests could be a good way to review this, but I will also go through the code.

nevrome commented 1 year ago

I now added the fetch logic, which I had missed entirely :facepalm:. The changes made fetch a bit simpler, in fact, which is a good thing. I'm not happy with the behaviour of it, though. Here's an example:

trident fetch -d . -f "*2014_LazaridisNature*" downloads the lastest version of the package, which is 4.0.0. trident fetch -d . -f "*2014_LazaridisNature-3.1.1*" downloads 3.1.1. trident fetch -d . -f "*2014_LazaridisNature-3.1.1*,*2014_LazaridisNature-4.0.0*" downloads both versions. That is all fine.

But trident fetch -d . -f "*2014_LazaridisNature-3.1.1*,*2014_LazaridisNature*" downloads only 3.1.1 and (even worse) trident fetch -d . -f "*2014_LazaridisNature*,*2014_LazaridisNature-3.1.1*" only 4.0.0. This is pretty counterintuitive.

The reason for that is (I think) a nub in readEntityInputs. For packages this falls back to the Eq instance of PacNameAndVersion, which I defined as follows:

instance Eq PacNameAndVersion where
    (==) (PacNameAndVersion n1 Nothing) (PacNameAndVersion n2 Nothing) = n1 == n2
    (==) (PacNameAndVersion n1 Nothing) (PacNameAndVersion n2 (Just _)) = n1 == n2
    (==) (PacNameAndVersion n1 (Just _)) (PacNameAndVersion n2 Nothing) = n1 == n2
    (==) (PacNameAndVersion n1 (Just v1)) (PacNameAndVersion n2 (Just v2)) = n1 == n2 && v1 == v2

This is sensible for everything else (so far), but here it falls apart. My sadness is immeasurable.

nevrome commented 1 year ago

We could probably overwrite this with a custom Eq instance for PoseidonEntity. Or avoid the custom instance for PacNameAndVersion. But I don't want to refactor everything that depends on it. I'll think about it.

nevrome commented 1 year ago

Ok - so I think I have a working solution now, @stschiff. I added tests and everything I could think of seems to work.

But this really needs a second set of eyes. The code got simpler with my changes, but it's still horribly confusing. The decision making sequence from determineRelevantPackages, to resolveEntityIndices and finally and most terrifyingly indInfoConformsToEntitySpec is so involved that I frequently ran out of brain memory: I couldn't remember the consequences of decisions at the beginning of the process for decisions at the end. Maybe all of this is more complicated than it needs to be, but I'm too deep in it to see how (right now).

I'm sure you'll (also) get the urge to rewrite the whole thing from scratch, but maybe we can rather follow the path of slow, evolutionary change I chose so far.

nevrome commented 1 year ago

I just had an intrusive thought: Instead of checking for each individual if they are allowed into the new package based on the information in the forgeScript, we could go through the forgeScript step by step and add and remove individuals according to what we find there. This could be a massively more simple logic, with the additional advantage of making the output respect the order of the forgeScript. I remember vaguely that we discussed this idea in the past. Why did we decide against it? Didn't you call this the cookie cutter principle?

nevrome commented 1 year ago

This PR was fully absorbed and superseded by #268.