tskit-dev / tskit

Population-scale genomics
MIT License
153 stars 72 forks source link

time windows in statistics #2948

Open petrelharp opened 4 months ago

petrelharp commented 4 months ago

Here @tforest and I are starting in on adding time windows to statistics. We're starting with what was sketched out in #683, and will explain things in more detail here when we're farther along (ignore this for now).

codecov[bot] commented 4 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 89.62%. Comparing base (beafeba) to head (d3e17a9).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #2948 +/- ## ======================================= Coverage 89.62% 89.62% ======================================= Files 29 29 Lines 30170 30170 Branches 5867 5867 ======================================= Hits 27041 27041 Misses 1790 1790 Partials 1339 1339 ``` | [Flag](https://app.codecov.io/gh/tskit-dev/tskit/pull/2948/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tskit-dev) | Coverage Δ | | |---|---|---| | [c-tests](https://app.codecov.io/gh/tskit-dev/tskit/pull/2948/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tskit-dev) | `86.20% <ø> (ø)` | | | [lwt-tests](https://app.codecov.io/gh/tskit-dev/tskit/pull/2948/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tskit-dev) | `80.78% <ø> (ø)` | | | [python-c-tests](https://app.codecov.io/gh/tskit-dev/tskit/pull/2948/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tskit-dev) | `88.72% <ø> (ø)` | | | [python-tests](https://app.codecov.io/gh/tskit-dev/tskit/pull/2948/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tskit-dev) | `99.01% <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tskit-dev#carryforward-flags-in-the-pull-request-comment) to find out more.
petrelharp commented 4 months ago

Note: it is not clear how to do this for site statistics, since the site stat is of the form $$\sum_a f(w_a)$$ where the sum is over alleles, and $w_a$ is the weight of all samples with allele $a$; however, it is mutations that have times, not alleles.

The proposal will probably be to compute a site stat that sums over mutations, not alleles, but we'll start with branch stats only for now.

petrelharp commented 4 months ago

Next step:

Also maybe:

andrewkern commented 4 months ago

a small nudge here that i mentioned to @petrelharp in passing-- it would be great to have an expectation from theory as to what time stratified quantities like the SFS should be under the (standard, neutral) coalescent

tforest commented 2 months ago

Some thoughts after working on time windows.

After these edits the moment the output of, let's say, the AFS is a still 2D array of windows, same for time_windows, when using either of them individually. However, when using windows and time_windows at the same time, the output is a 3D array, with the following shape: [num_windows][num_time_windows][sample_size]. When windows or time_windows are None, associated dimensions are dropped accordingly. As there is now two types of windows, it will become ambiguous that the historical "windows" parameter is in fact corresponding specifically to genomic spanning windows. We did not renamed it for now though, as it would break previous behavior.

Some ideas:

petrelharp commented 2 months ago

A note on the potential confusion between windows and time_windows - often one endpoint of the time_windows will be Inf, so if we make sure we produce an informative error if the windows aren't finite, we'll help people avoid the mistake.

benjeffery commented 1 week ago

I've added this work to the next release milestone. Hoping to get a release out in a week or two, if that is too ambitious for this let me know.

petrelharp commented 1 week ago

Probably too ambitious, but we might have something in by then.