poseidon-framework / poseidon-hs

A toolset to work with modular genotype databases in the Poseidon format
https://poseidon-framework.github.io/#/trident
MIT License
7 stars 2 forks source link

V 1.4.0.3: Optimization of resolveEntityIndices #280

Closed stschiff closed 1 year ago

stschiff commented 1 year ago

As already somewhat touched upon around the code changes introduced in 1.4.0.0, we had a severe performance leak in resolveEntityIndices, which affected fetch, forge, and crept into xerxes as well.

I solved this now by simply computing the isLatest vector only once when I create individualInfos from packages (getJointIndividualInfo). For that purpose I have introduced a new type synonym for the tuple of indInfos and Bools, defined in EntityTypes:

type IndividualInfoCollection = ([IndividualInfo], [Bool]) 

Key downstream functions determineRelevantPackages, resolveEntityIndices, resolveUniqueEntityIndices, determineNonExistentEntities and checkIfAllEntitiesExist all now use this new tuple, which saves them from calling isLatestInCollection themselves.

As far as I can see, this solved the performance bug completely.

This command line is diagnostic:

trident fetch -d comp_book_fstats_data/ -f "*2010_RasmussenNature*"

Before this change, this took ages before actually starting to download. Now it starts almost immediately.

You can test from this branch via

stack run trident -- fetch -d comp_book_fstats_data/ -f "*2010_RasmussenNature*"

I've bumped the version number and added a Changeling. I will now update xerxes as well and will see whether there is anything in the new API that needs further change.

codecov[bot] commented 1 year ago

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (f9e5281) 70.87% compared to head (fcfb576) 68.63%. Report is 1 commits behind head on master.

:exclamation: Current head fcfb576 differs from pull request most recent head 20acc01. Consider uploading reports for the commit 20acc01 to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #280 +/- ## ========================================== - Coverage 70.87% 68.63% -2.25% ========================================== Files 25 25 Lines 3217 3341 +124 Branches 359 382 +23 ========================================== + Hits 2280 2293 +13 - Misses 578 666 +88 - Partials 359 382 +23 ``` | [Files](https://app.codecov.io/gh/poseidon-framework/poseidon-hs/pull/280?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=poseidon-framework) | Coverage Δ | | |---|---|---| | [src/Poseidon/CLI/Fetch.hs](https://app.codecov.io/gh/poseidon-framework/poseidon-hs/pull/280?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=poseidon-framework#diff-c3JjL1Bvc2VpZG9uL0NMSS9GZXRjaC5ocw==) | `48.71% <100.00%> (ø)` | | | [src/Poseidon/CLI/Forge.hs](https://app.codecov.io/gh/poseidon-framework/poseidon-hs/pull/280?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=poseidon-framework#diff-c3JjL1Bvc2VpZG9uL0NMSS9Gb3JnZS5ocw==) | `71.32% <100.00%> (ø)` | | | [src/Poseidon/CLI/Survey.hs](https://app.codecov.io/gh/poseidon-framework/poseidon-hs/pull/280?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=poseidon-framework#diff-c3JjL1Bvc2VpZG9uL0NMSS9TdXJ2ZXkuaHM=) | `80.95% <ø> (ø)` | | | [src/Poseidon/EntityTypes.hs](https://app.codecov.io/gh/poseidon-framework/poseidon-hs/pull/280?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=poseidon-framework#diff-c3JjL1Bvc2VpZG9uL0VudGl0eVR5cGVzLmhz) | `77.70% <100.00%> (-3.07%)` | :arrow_down: | | [src/Poseidon/Package.hs](https://app.codecov.io/gh/poseidon-framework/poseidon-hs/pull/280?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=poseidon-framework#diff-c3JjL1Bvc2VpZG9uL1BhY2thZ2UuaHM=) | `77.37% <100.00%> (-0.30%)` | :arrow_down: | | [src/Poseidon/ServerClient.hs](https://app.codecov.io/gh/poseidon-framework/poseidon-hs/pull/280?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=poseidon-framework#diff-c3JjL1Bvc2VpZG9uL1NlcnZlckNsaWVudC5ocw==) | `70.37% <100.00%> (-2.97%)` | :arrow_down: | | [src/Poseidon/CLI/Validate.hs](https://app.codecov.io/gh/poseidon-framework/poseidon-hs/pull/280?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=poseidon-framework#diff-c3JjL1Bvc2VpZG9uL0NMSS9WYWxpZGF0ZS5ocw==) | `43.28% <0.00%> (ø)` | | ... and [7 files with indirect coverage changes](https://app.codecov.io/gh/poseidon-framework/poseidon-hs/pull/280/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=poseidon-framework)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

stschiff commented 1 year ago

OK, I've updated now to LTS-21.17. Build and tests work without --pedantic, but one warning breaks pedantic compilation:

Survey.hs:98:66: warning: [-Wtype-equality-requires-operators]
    The use of ‘~’ without TypeOperators
    will become an error in a future GHC release.
    Suggested fix: Perhaps you intended to use TypeOperators
   |                         
98 |         getRatiosForEachField :: (Generics.SOP.Generic a, Code a ~ '[ xs ], All PresenceCountable xs) => [a] -> [Ratio Int]
   |                                                                  ^