tskit-dev / msprime

Simulate genealogical trees and genomic sequence data using population genetic models
GNU General Public License v3.0
171 stars 84 forks source link

Simulating sex chromosomes #2044

Open LukeAndersonTrocme opened 2 years ago

LukeAndersonTrocme commented 2 years ago

I've already simulated 22 autosomes for ~1.5M samples using the FixedPedigree model and I'm wondering about how much work might be required to simulate sex chromosomes. We are planning on releasing these data on zenodo, and in a worst case scenario, we can update the data dump to include sex chromosomes down the line, but it would be nice to be able to include all chromosomes right from the start.

I had a very brief chat with @hyanwong and Léa Guyon about this topic at ProbGen22 and from what I recall it sounds like a pretty straight forward process that mostly involves careful pruning of the input pedigree to restrict the ancestry simulation to biologically plausible inheritance paths.

Here's what I think is needed, but please let me know if/where there are issues I'm overlooking:

Y chromosome

Mitochondria

X chromosome

I think the Y and Mito are relatively straight forward to implement, though I think the X chromosome may need some extra considerations, but AFAIK these only impact the sample nodes.

I guess my question is, does anyone have a sense of whether the FixedPedigree ancestry simulation will handle these partially pruned pedigrees?

Any comments, feedback is appreciated!

Cheers

jeromekelleher commented 2 years ago

This is a great idea @LukeAndersonTrocme!

The only issue I can see with running the FixedPedigree sims on these pruned pedigrees is that we'll hit dead-ends more quickly, but that's the reality too.

I think it would take a bit of work to implement the "cutting" and fully validate the sims, so I'd vote for keeping this feature as something that's on the TODO list.

LukeAndersonTrocme commented 2 years ago

Interesting point about the dead-ends, makes sense. And yes, agreed to keep this for a TODO. Either way, I might try to prune the pedigree myself and run the sims to see what happens. I might post some updates in this thread when I get around to it.

benjeffery commented 2 years ago

It seems this is an msprime issue - shall I move it there?

Darokrithia commented 2 years ago

Commenting as I would like to know as well. How would you go about pruning male male connections during simulation? I could be wrong but my assumption is you'd have to do that to prevent erroneous recombination in males.

Also how do you track sex in msprime? I couldn't find anything in a cursory look through the documentation.

jeromekelleher commented 2 years ago

Also how do you track sex in msprime? I couldn't find anything in a cursory look through the documentation.

There's no specific features at the moment, but you can add arbitrary metadata to your individuals in the input pedigree so you can find the males/females later.

LukeAndersonTrocme commented 2 years ago

Thanks for reviving this thread @Darokrithia.

I'm not sure how sex is tracked during the simulations, as far as I know it is included in the metadata, but not actually used in the simulation process for autosomes.

My comments about pruning the trees were referring to modifying the genealogy prior to the simulations. Just to flesh out a rough idea on this using a recursive tree climbing approach (likely only done once so may not need to be super efficient):

This pruned pedigree should only then have records of paternal inheritance.

If this sounds reasonable, then the next step is plugging this pruned pedigree into the msprime simulation. Would it just be a simple as using a haploid inheritance model?

Where things get interesting is the dose dependence of the X chromosome.. Male probands would only need one instance of this chromosome but female probands need two.

LukeAndersonTrocme commented 2 years ago

image

Maybe obvious to most, but I found that drawing out the inheritance paths helped clarify things to me

(edit: RED -- male probands potential inheritance, BLUE -- female probands potential inheritance, SOLID -- deterministic, DASHED -- stochastic, YELLOW -- highlights impossible inheritance of X)

Darokrithia commented 2 years ago

@LukeAndersonTrocme to clarify does this require a pedigree to be generated (and then pruned) before simulation?

LukeAndersonTrocme commented 2 years ago

Yes exactly. Sorry I should have clarified this from the top. This is in the context of fixed pedigree ancestry simulations.

On Aug 10, 2022, at 11:51 PM, Daniel Tabin @.***> wrote:

 @LukeAndersonTrocme to clarify does this require a pedigree to be generated (and then pruned) before simulation?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.