tskit-dev / tstrait

Quantitative trait simulation for ARGs
https://tskit.dev/software/tstrait
MIT License
7 stars 6 forks source link

CODE: sim_trait #111

Closed daikitag closed 11 months ago

daikitag commented 11 months ago

Update the sim_trait function.

codecov[bot] commented 11 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (0e49d80) 100.00% compared to head (61ec820) 100.00%.

:exclamation: Current head 61ec820 differs from pull request most recent head 7d1929a. Consider uploading reports for the commit 7d1929a to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #111 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 8 8 Lines 304 349 +45 Branches 32 38 +6 ========================================= + Hits 304 349 +45 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

daikitag commented 11 months ago

I was very much overestimating the computational time that it takes to examine the allele frequency. This shows the result for a tree sequence with 1 million samples, 1Mb, 1000 causal sites.

Allele frequency: image

Non allele frequency (alpha=0): image

daikitag commented 11 months ago

It seems that obtaining the genotype was the bottle neck of the sim_phenotype algorithm, and not obtaining the allele frequency

image

daikitag commented 11 months ago

Since the computational time is very fast, I'm thinking about adding allele frequency column to sim_trait function, as it is widly used in simulation-based studies, and obtaining it from tskit is not very clear. Do you have any suggestions on this @jeromekelleher ?

daikitag commented 11 months ago

Added allele frequency output here (https://github.com/daikitag/tstrait/pull/1). I find it to be useful to have the allele frequency column, as it will make the explanation of the frequency dependence model in the documentation easy to understand, and allele frequency is typically part of the standard packages to simulate quantitative traits.

daikitag commented 11 months ago

Draft of the documentation of sim_trait is added here: https://github.com/daikitag/tstrait/pull/2

jeromekelleher commented 11 months ago

Since the computational time is very fast, I'm thinking about adding allele frequency column to sim_trait function, as it is widly used in simulation-based studies, and obtaining it from tskit is not very clear. Do you have any suggestions on this @jeromekelleher ?

What's it used for other than frequency dependence via alpha @daikitag? I feel like we can add this to the data frame later if there's a feature request, but it's not something we have anyone actually asking for?

daikitag commented 11 months ago

Since the computational time is very fast, I'm thinking about adding allele frequency column to sim_trait function, as it is widly used in simulation-based studies, and obtaining it from tskit is not very clear. Do you have any suggestions on this @jeromekelleher ?

What's it used for other than frequency dependence via alpha @daikitag? I feel like we can add this to the data frame later if there's a feature request, but it's not something we have anyone actually asking for?

I've seen some papers that simulate genetic values by using all causal sites and then see the influence of causal mutations depending on allele frequencies, but I agree with you that we can add the feature later. I would like to add the plot of (allele frequency vs effect size) to show the frequency dependence model in the documentation, but do you have any suggestions on how I can compute the allele frequency of causal mutations there?

jeromekelleher commented 11 months ago

OK, let's keep the allele_frequency if you already have a use for it.

daikitag commented 11 months ago

Thanks @jeromekelleher . Would it be possible for you to merge this pull request? I would like to make a new PR for allele frequency and I would also like to use this function in unit tests for genetic_value.

daikitag commented 11 months ago

https://github.com/tskit-dev/tstrait/issues/113 updated here

daikitag commented 11 months ago

@jeromekelleher I created a new pull request for the documentation of the new sim_trait function here https://github.com/tskit-dev/tstrait/pull/115. I think I'm finished with making all the modifications to the sim_trait function, and would it be possible for you to reivew it whenever you have some time?