poseidon-framework / poseidon-analysis-hs

A tool to analyse genotype data (optionally in the poseidon data format)
MIT License
3 stars 0 forks source link

Sampling in chunks for admixpops #17

Closed nevrome closed 1 year ago

nevrome commented 2 years ago

Compiles and runs, but beyond that it's entirely untested.

codecov-commenter commented 2 years ago

Codecov Report

Patch coverage has no change and project coverage change: +0.14 :tada:

Comparison is base (d1026e9) 1.09% compared to head (5d6367e) 1.24%.

:exclamation: Current head 5d6367e differs from pull request most recent head e5b282b. Consider uploading reports for the commit e5b282b to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #17 +/- ## ======================================== + Coverage 1.09% 1.24% +0.14% ======================================== Files 3 3 Lines 457 402 -55 ======================================== Hits 5 5 + Misses 452 397 -55 ``` [see 3 files with indirect coverage changes](https://app.codecov.io/gh/poseidon-framework/poseidon-analysis-hs/pull/17/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=poseidon-framework)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

nevrome commented 2 years ago

mds_mds BantuSA_mds

Sampling in chunks seems to work - and it behaves interestingly. Please take a look, @stschiff.

stschiff commented 2 years ago

Great, will take a close look!

stschiff commented 1 year ago

Re your first plot: Assume the "c" labels denote the chunked ones. Then the plot shows that both the chunked and the sap-based fake-admixed individuals are lining up linearly between the two groups as expected, but they are separated from each other on PC2? I don't think that is too concerning, as it is likely that the chunking creates mild batch effects with respect to the sap-based ones. Also, if you plotted this on similar scales for PC1 and PC2 you would see that the batch effect is relatively mild.

Re your second plot: I don't understand what is shown. Could you briefly comment?

nevrome commented 1 year ago

OK - I renamed some types to make the code a bit more readable. I also ran my own tests again to check if the code is still running after all this time and if it still produces plausible, yet experimental and not well understood results. As this is the case, I will merge now and make a new release, which will also enable xerxes to handle Poseidon v2.7.1 packages.

Eventually I would like to go back to admixpops and consolidate the current set of features - ideally with a blog post presenting the test code over in https://github.com/nevrome/paagen.