popsim-consortium / analysis2

Analysis for the second consortium paper.
8 stars 14 forks source link

Sweeps simulation Snakefile #79

Closed mufernando closed 1 year ago

mufernando commented 1 year ago

WIP

mufernando commented 1 year ago
nspope commented 1 year ago

This is working with:

snakemake -c1 --snakefile workflows/sweep_simulate.snake

But, I'm not sure how to integrate this into the main Snakefile without screwing stuff up -- as this workflow uses a different configfile from the DFE stuff. Do we define separate configfiles for each module in Snakefile? @andrewkern do you have a suggestion?

Also, I changed things so we are simulating (possibly overlapping) windows instead of the entire chromosome. For now, this'll let us expand out the simulation grid without a huge computational burden. Ultimately the plan is to simulate entire chromosomes but "trim" to the windows -- I'll add a toggle to the config for this behavior. (though this might only be possible in practice with a smaller chromosome)

andrewkern commented 1 year ago

yeah you could definitely define a separate config file here and in this snakefile point to it

andrewkern commented 1 year ago

frankly the config stuff could use a bit of an overhaul -- last I looked at it there was a lot of hard coded redundant paths

nspope commented 1 year ago

Cool, thanks -- I'll avoid adding more stuff to Snakefile for the time being, then.

mufernando commented 1 year ago
mufernando commented 1 year ago

I simulated chromosome 22 with the Gamma_K17 DFE on exons and no scaling. It took me 1h18min on sesame. And simulating 10Mb of chrom 22 takes 12min.

time stdpopsim -vv -e slim --slim-scaling-factor 1 HomSap -d OutOfAfrica_3G09 YRI:10 --dfe Gamma_K17 --dfe-annotation ensembl_havana_104_CDS -c chr22 -o foo.ts &>log.txt
stdpopsim -vv -e slim --slim-scaling-factor 1 HomSap -d OutOfAfrica_3G09       4689.52s user 220.01s system 100% cpu 1:21:42.95 total_
time stdpopsim -vv -e slim --slim-scaling-factor 1 HomSap -d OutOfAfrica_3G09 YRI:10 --dfe Gamma_K17 --dfe-annotation ensembl_havana_104_CDS -c chr22 --right 10000000 -o foo.ts &>log2.txt
stdpopsim -vv -e slim --slim-scaling-factor 1 HomSap -d OutOfAfrica_3G09       775.53s user 62.33s system 100% cpu 13:50.29 total
andrewkern commented 1 year ago

wow, very nice. using the cluster we could definitely get 100s of reps for full chr22.

nspope commented 1 year ago

@andrewkern yeah, but if we're also varying sweep location across a fine grid that'll be prohibitively costly, I think. @mufernando just benchmarked further and it takes ~12 minutes to simulate 20% of chr22 with no scaling, and 1.5min with a scaling factor of 4. So I think windowing + scaling is the way to go, as long as we don't do it too aggressively.

andrewkern commented 1 year ago

i agree for a fine grid, but i think its perhaps worth our time to take one or a small subset of locations and ask, "does full chrom simulation matter much?"

essentially we'd be asking about chrom-wide effects of BGS

nspope commented 1 year ago

yeah I agree! e.g., lay down a sweep in the middle of chr22, and simulate the entire chrom and progressively smaller windows centered around the sweep. If there are boundary effects they'd be visible in patterns of diversity at the window edges, I guess? And a global effect would be reflected by average diversity in the window?

andrewkern commented 1 year ago

btw I started drafting the diploshic module for this part of the analysis here

nspope commented 1 year ago

Looks great, thanks Murillo! I'm assuming we'll be saving sims as vcf to be fed to the sweep detection methods, so I think we should also add a routine to dump vcf (for the focal window only) alongside the tree sequences.

andrewkern commented 1 year ago

@mufernando do we need results/simulated_data/sweeps/boundary_effect_bgs.png in the PR? seems like just an image to post on an issue?

andrewkern commented 1 year ago

also this is probably ready to come out of draft mode?

mufernando commented 1 year ago

@andrewkern I thought that figure would make it to the supp material at some point?

andrewkern commented 1 year ago

okay sounds good

nspope commented 1 year ago

@mufernando do you mind if I push some commits with helper functions for sweepfinder? Or should I hold off for now

mufernando commented 1 year ago

I think we should merge this and start separate PRs

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: nspope @.> Sent: Tuesday, January 24, 2023 10:29:32 AM To: popsim-consortium/analysis2 @.> Cc: Murillo R. @.>; Mention @.> Subject: Re: [popsim-consortium/analysis2] Sweeps simulation Snakefile (PR #79)

@mufernandohttps://github.com/mufernando do you mind if I push some commits with helper functions for sweepfinder? Or should I hold off for now

— Reply to this email directly, view it on GitHubhttps://github.com/popsim-consortium/analysis2/pull/79#issuecomment-1402400811, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACL5HFC5SVDG53N2A7LFEYTWUANQZANCNFSM6AAAAAARDQWVYU. You are receiving this because you were mentioned.Message ID: @.***>

andrewkern commented 1 year ago

@mufernando do you want to make any more edits to this at this point?

mufernando commented 1 year ago

I think this is ready to be merged. @andrewkern got it to work independently from me. Now we need to tune the sweep parameters (the time of the sweep).