popsim-consortium / analysis2

Analysis for the second consortium paper.
8 stars 14 forks source link

Selection paper outline #8

Open andrewkern opened 3 years ago

andrewkern commented 3 years ago

Hey all-- I'm opening up an issue for us to start bashing away at an outline for the second paper. a particular goal is to have a solid list of the analyses we want to do and then later, delegation of those analyses to particular individuals/groups.

We have a google doc going for the outline here , but it might be preferable to just use this issue and so I've copied that text below


Selection & PopSim Paper 2

Timeline for selection papers: Late summer early fall Companion papers on 1) sweeps & 2) rescaling. Also similar timeline.

Outline of main analyses for main paper: Comparison of different DFE methods like FitDadi polyDFE, GRAPES (Ryan G’s group & Izabel can work on this). How is demography dealt with? Sample size? Sweeps! (will be its own companion paper that Andy is leading, but some key results in the main paper). Implement sweep models from literature. Make a model in StdPopSim “recurrent_sweeps”. Can put this model with different demographics & rec rates, etc. Look at summary stats & power to detect sweeps in human genomes under different demographic models. Look at power of ML methods Confounders. Multiple sweeps. Sweeps & BGS. How do DFE methods perform when sweeps are included? Selection confounding demographic inference (can recycle a lot of pipelines from paper 1, just running them on models with selection).

What we need to do: Decide what models to do: DFE Sweep https://github.com/popsim-consortium/analysis2 Implement models QC Analyses

######################################################

Brainstorming of ideas for PopSim Selection paper form the call on 6/15 (not all will be in paper):

Comparison of different DFE methods (Ryan G’s group can work on this). How is demography dealt with? Sample size?

Scaling (maybe merits its own paper delving into theory of scaling...might be too ambitious for PopSim paper) Ideally, PopSim paper will point to this companion paper. PopSim paper will have to mention scaling in some way. PopSim paper could connect it with applications...use guidelines from theory paper to do stuff for a particular organism 3)Do current models of DFEs/annotations in humans predict summaries of genetic variation (spatial pattern of pi, SFS, LD?)? (strength: leverage demographic models from before...annotations, DFE...all the fancy stuff together. Great way to showcase the whole resource! Guidance for how well the field is doing in terms of model adequacy) What if synonymous (or “neutral sites”) are actually under selection? Does that confound things. Sweeps! (may be its own paper, but could put some key results in the main paper). Implement sweep models from literature. Make a model in StdPopSim “recurrent_sweeps”. Can put this model with different demographics & rec rates, etc. Look at summary stats & power to detect sweeps in human genomes under different demographic models. Look at power of ML methods Confounders. Multiple sweeps. Sweeps & BGS. How do DFE methods perform when sweeps are included? Selection confounding demographic inference In paper say how stdpopsim can be used to test “your new method” for detecting selection. No one perfect statistic--depends on biology, data, etc. Try to show an example in the paper from a non-human example.

izabelcavassim commented 2 years ago

We have made some decisions in terms of the manuscript's scope (ping me, correct me if I am wrong) based on the discussion we had today (02/22/22) during our biweekly meeting:

PART I

For the demography inference with flavors of selection (background selection)

These analyses are halfway implemented in our current analyses2 repository specifically in n_t.snake workflow

We also want the multi-population analyses

Part II

For the DFE inference excluding the positive portion

I think implementations are almost finished, see analyses2 repository for details, thanks to @andrewkern @petrelharp @mufernando, and others...

Part III

Understanding/simulating beneficial mutations as in a sweep using the positive portion of a DFE

This is still a work in progress, but two things could be evaluated here methods inference:

As @andrewkern @petrelharp have pointed out, there are multiple features to be added in terms of positive selection, that could either be included in the discussion of this manuscript and implemented in the next paper, or that could be implemented for this paper but not trivial. I would personally vote to simplify and leave it as future work just so we don't lose momentum.

chriscrsmith commented 1 year ago

Update based on @izabelcavassim 's previous post:

\ PART I Single-population demographic inference methods:

Multi-population demographic inference methods:

Mostly complete. Need production sims and final plots.

\ PART II DFE inference methods:

Mostly complete. Need production sims and final plots.

\ PART III Sweeps:

There is work left to do for this aim.

\ SPECIES:

Human

Arabidopsis

\ PLOTS:

Plots in each of the above areas could be kept relatively simple and extra information reported in tables or supp mat; or they could get pretty big including panels for different methods, species, and DFEs... TBD

RyanGutenkunst commented 1 year ago

It would be nice to do a non-gamma DFE for one of our simulations. Maybe a lognormal, even reaching back to Boyko 2008?

nspope commented 1 year ago

Sweeps:

  • Want to quantify the effect of BGS on sweep detection.
  • Compare different sweep methods?
  • How is dadi inference of negative fitness effects influenced by positive portion?
  • Analyze divergent selection (between pops)?
  • (@izabelcavassim had suggested to simplify and leave some of these as future work)

There is work left to do for this aim.

What we're set up to do is "compare different sweep detection methods" in terms of FPR/TPR in windows across a chromosome. In particular, there's a working pipeline that uses sweepfinder2 to detect sweeps in windows across a chromosome (under simulated neutral/BGS/BGS+sweep scenarios). There's a start at a similar pipeline for diploshic, but it isn't finished.

So, assuming that what'll go in the paper is a sweepfinder vs diploshic comparison, what remains to be done is:

(I think? Tagging @mufernando and @andrewkern as they're the ones who've put these workflows together.)

This should serve as an illustration of what stdpopsim can do wrt sweeps, so maybe we don't need anything else? The other bullet points, while interesting, seem like a lot of work without clear questions in mind.

petrelharp commented 1 year ago

Notes from the meeting: proposal is for the sweeps section, discuss:

petrelharp commented 1 year ago

We also discussed, following @RyanGutenkunst 's comment above, adding a non-gamma DFE for humans, and running the DFE inference pipeline on it: https://github.com/popsim-consortium/stdpopsim/issues/1470

andrewkern commented 6 months ago

I started stubbing out a manuscript in a new repo here: https://github.com/popsim-consortium/analysis2_manuscript

I'm planning on starting to the writing today