tskit-dev / msprime

Simulate genealogical trees and genomic sequence data using population genetic models
GNU General Public License v3.0
171 stars 83 forks source link

Support internal samples in simulation through pedigree #1855

Open jeromekelleher opened 2 years ago

jeromekelleher commented 2 years ago

A simplification made in #1846 was to disallow samples that are internal nodes in the pedigree (i.e., samples cannot have children). It would not be difficult to implement, it just needs some thinking through exactly what the semantics are.

apragsdale commented 2 years ago

Ran into the error pointing me to this issue. I am interested in this as well.

hyanwong commented 2 years ago

I am also interested, as I would like some ground-truth examples of simulations with non-contemporaneous samples and known pedigrees to test tsinfer (see https://github.com/tskit-dev/tsinfer/discussions/602).

apragsdale commented 2 years ago

I'm thinking about this a bit more. Based on some recent discussion in the msprime Slack channel, I realized that the DTWF model also doesn't allow ancient samples to be direct ancestors of more recent samples, so it's not possible to sample direct (grand)parent-offspring relationships under DTWF. It would be great to be able to do this within the pedigree (e.g. simulate a bunch of trios within a population).

Would there need to be serious reworking of the backend, or is it mostly a matter of testing to make sure labeling ancient individuals as samples within the pedigree is doing what we expect? From #1846, it was hard to tell what the sticking point is here.

jeromekelleher commented 2 years ago

There's a note on it here.

In principle it's simple - we just have to replace the existing segment chain for the sample a with a new chain (0, L, a) (so that we have ancestry everywhere for our sample) and insert edges for every segment x in the old chain, like

for (x = existing_segments_for_a_head; x != NULL; x = x->next) {
     if (x.node != a) {
         add_edge(x.left, x.right, a, x.node)
      }
}

(excuse the mixed up pseudocode!)

Hmm, that seems pretty straightforward actually - fancy having a go at it on the Python algoriths.py version @apragsdale ?

apragsdale commented 2 years ago

Yes, I’d be happy to!

On 24 Nov 2021, at 13:33, Jerome Kelleher @.***> wrote:



There's a note on it herehttps://github.com/tskit-dev/msprime/blob/7fbaab3b031f7208cd82b79763e924260f412723/lib/msprime.c#L2184.

In principle it's simple - we just have to replace the existing segment chain for the sample a with a new chain (0, L, a) (so that we have ancestry everywhere for our sample) and insert edges for every segment x in the old chain, like

for (x = existing_segments_for_a_head; x != NULL; x = x->next) { add_edge(x.left, x.right, a, x.node) }

(excuse the mixed up pseudocode!)

Hmm, that seems pretty straightforward actually - fancy having a go at it on the Python algoriths.py version @apragsdalehttps://github.com/apragsdale ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/tskit-dev/msprime/issues/1855#issuecomment-978164864, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIBL46XJUA42WBSQOT62GC3UNU4XHANCNFSM5FLX6ERA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

abureau commented 2 years ago

This functionality to sample subjects internal to the pedigree would be very useful to research I am conducting on probabilities of rare variant sharing by related individuals. I hope this is implemented soon in msprime.

jeromekelleher commented 2 years ago

Thanks for letting us know you're interested in this feature @abureau. There's no immediate plans to work on it unfortunately, unless anyone wants help out by doing some coding/testing.

abureau commented 2 years ago

I would be happy to test the implementation, but I do not master msprime to the point to code a new feature.

jeromekelleher commented 2 years ago

Thanks @abureau - we'll definitely call on your help if/when it gets implemented. Unfortunately I can't dedicate any of my own time here, but I'm happy to help if someone else wants to do a bit of model coding.

diegovelizo commented 2 months ago

I just ran into the same error message pointing me to this issue. It would be really useful if this feature was implemented.

jeromekelleher commented 2 months ago

I think the status is the same @diegovelizo - nobody has implemented it yet, unfortunately. I think we need someone who is motivated to code this up in the Python prototyper (algorithms.py) and test it a bit. The actual C implementation would be quite easy then.

abureau commented 2 months ago

I secured funding from NSERC that I can use to pay a developer of msprime interested in implementing sampling of subjects internal to the pedigree. That person needs to be based in Canada so I can pay him or her from my NSERC funds. Anyone in Canada interested, please contact me.

petrelharp commented 2 months ago

Wow, congratulations! Great news!

jeromekelleher commented 1 month ago

That's great news @abureau! Would you like to have a chat soon to discuss?

abureau commented 1 month ago

Thanks for offering to chat about this implementation @jeromekelleher . That will be useful once I have identified the programmer who will do the job. I will contact you then.

jeromekelleher commented 1 month ago

Sounds good to me @abureau, good luck!