popsim-consortium / adding-species-manuscript

manuscript materials for the adding species paper
3 stars 14 forks source link

plr edits #70

Closed petrelharp closed 2 years ago

petrelharp commented 2 years ago

I made a pass through! Looks great - just minor edits. I'm happy to send this around.

petrelharp commented 2 years ago

Here's a latexdiff. diff.pdf

lauterbur commented 2 years ago

I skimmed these this morning and they looked good, I will take a closer look within the next couple of hours.

On Wed, Sep 28, 2022 at 11:21 AM igronau @.***> wrote:

@.**** commented on this pull request.

Most edits look great. I left a few comments. I suggest that @lauterbur https://github.com/lauterbur also takes a pass to resolve some of these remaining issues. They are all minor and appear to indicate oscillation of the editing process, which is a good sign that we're ready

In adding_species_manuscript.tex https://github.com/popsim-consortium/adding-species-manuscript/pull/70#discussion_r982682834 :

@@ -315,7 +311,7 @@ \section*{The utility of \stdpopsim for genome-wide simulations} and if necessary with input of additional members of the community. This quality control process quite often finds subtle bugs \citep[e.g., as in][]{Ragsdale2020} or highlights parts of the model that are ambiguously defined by the literature sources. -This increases the reliability of the resulting simulations in any downstream analysis. +This increases the reproducibility of the resulting simulations in any downstream analysis.

I think both versions are fine, but you get reproducible simulations even if the code did not undergo any QC. IMO, the QC is mostly meant to improve the reliability of the simulations in the sense that the user can rely on the fact that they provide an honest implementation of the information in the cited sources.

In adding_species_manuscript.tex https://github.com/popsim-consortium/adding-species-manuscript/pull/70#discussion_r982688433 :

(using the \texttt{SLiM} engine) by specifying genome annotations and distributions of fitness effects, as specified below. -We note that the ability to simulate chromosomes with realistic models of -selection is still under development and will be finalized in the next release of \stdpopsim. +The ability to simulate chromosomes with realistic models of +selection is still under development and will be finalized in the next release of \stdpopsim, +and will be described in more detail in a future paper.

you have two subsequent 'and's here. Should we switch the second one to a 'which' ("which will be described...")?

In adding_species_manuscript.tex https://github.com/popsim-consortium/adding-species-manuscript/pull/70#discussion_r982701291 :

@@ -761,7 +763,7 @@ \subsection*{\texorpdfstring{\emph{Anopheles gambiae} (mosquito)}{Anopheles gamb accompanied by the relevant citation information, and the model underwent the standard quality control process. %Ilan: any issues worth mentioning that were raised in the QC? -The model may be refined in the future by adding more demographic models, +The species may be refined in the future by adding more demographic models,

I don't assume that anyone is going to think we're considering modifying the actual mosquito species, but "refining a species" sounds a bit odd ;-).

In adding_species_manuscript.tex https://github.com/popsim-consortium/adding-species-manuscript/pull/70#discussion_r982713966 :

@@ -846,18 +849,20 @@ \section*{Conclusion} We also discuss special considerations for collecting relevant information from the literature, and what to do if some of that information is not available. Because this process is quite error-prone, -we encourage everyone implementing simulations to have their -parameter choices and implementation reviewed by at least one other -researcher. -These steps can be followed if you are implementing a simulation independently for your own use, -or, as we encourage, when you add your simulation code to \stdpopsim, -to ensure its robustness and make it available for future standardized research. +we encourage wider adoption of ``code review'', +i.e., for researchers implementing simulations to have their +parameter choices and implementation reviewed by at least one other researcher. +%% PLR: It's not totally clear which "these steps" are, and they are actually required for stdpopsim?

"These steps" refers to our guidelines, and not just the QC part. This statement is important, but could be clarified by changing "These steps" to "The guidelines we provide in this paper". Some of these guidelines are strict requirements of stdpopsim and some are not. I think it's okay not to make this distinction here.

In adding_species_manuscript.tex https://github.com/popsim-consortium/adding-species-manuscript/pull/70#discussion_r982667645 :

@@ -167,15 +167,11 @@ \section*{Abstract} The initial version of \stdpopsim focused on establishing this framework using six well-characterized model species. Here, we report on major updates made in the new release of \stdpopsim (version 0.2), -which includes a significant expansion of the species catalog and substantial additions to simulation capabilities that make it possible to simulate a broader range of species. +which includes a significant expansion of the species catalog and substantial additions to simulation capabilities.

This is mostly a reference to bacterial recombination. Indeed, not all simulation capabilities that were added have to do with expanding the set of species. I'm flagging @lauterbur https://github.com/lauterbur here, because I think it was text that she suggested.

In adding_species_manuscript.tex https://github.com/popsim-consortium/adding-species-manuscript/pull/70#discussion_r982727759 :

Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog by more than three-fold and broadened coverage across the tree of life. -Our experience throughout this process was that people are keen to put in the time and effort to include their study species, -but that simple, clear guidance is vital. -Consequently, our intention with this paper is also to provide a learning -modality to meet that need,

I agree with changing "modality" to "guidelines", but I like preceding this with a statement that ties these guidelines to a need that came up during the expansion of the catalog. This gets a bit lost in the shorter version. I couldn't think of a good compromise. @lauterbur https://github.com/lauterbur? Do you have a suggestion here?

In adding_species_manuscript.tex https://github.com/popsim-consortium/adding-species-manuscript/pull/70#discussion_r982679285 :

when linked selection is important in the patterns of genomic diversity being studied \citep{Cutter2013}. Furthermore, the demographic history of a species, encompassing population sizes and distributions, divergences, and gene flow, can dramatically affect patterns of genomic variation \citep{Teshima2006}. Thus species-specific estimates of these and other ecological and evolutionary parameters (e.g., those governing the process of natural selection) -are fundamentally important when developing simulations. +are important to realistic simulations.

Agree, but "important to realistic simulations" reads a bit funny to me. How about "important when generating realistic simulations"?

In adding_species_manuscript.tex https://github.com/popsim-consortium/adding-species-manuscript/pull/70#discussion_r982684962 :

a new species model and adding it to the \stdpopsim code base. -Roughly 20 scientists participated in the hackathon, +Roughly 20 scientists participated in the hackathon (and are included as authors on this paper),

That's one for @lauterbur https://github.com/lauterbur

In adding_species_manuscript.tex https://github.com/popsim-consortium/adding-species-manuscript/pull/70#discussion_r982696441 :

@@ -681,13 +683,13 @@ \subsection*{Filling out the missing pieces} %Ilan: tried to give a few concrete examples here If, for example, the main purpose of the simulation is to examine the distribution of lengths of shared haplotypes between individuals, or study patterns of background selection, -then it makes sense to simulate such pseudo chromosomes. -Note, however, that specific genetic correlations between different contigs lumped together can be highly inaccurate. -So, if the main purpose of the simulation is to examine local patterns of genetic variation in loci of interest, then it would be more appropriate to simulate the relevant contigs separately (even if they are short), or to randomly sample several mappings of contigs to pseudo-chromosomes. -Finally, we note that for some purposes it may be sufficient to simulate a large number of unlinked sites \citep{Gutenkunst2009,Excoffier2013},

The main reason to add this statement is to give people a last-resort option if they have popgen data but no genome assembly, like in RAD-SEQ studies. They might still be able to infer demographic histories and simulate data. I'm fine with leaving this out, so that this paragraph is focused on the more interesting cases of partial builds.

In adding_species_manuscript.tex https://github.com/popsim-consortium/adding-species-manuscript/pull/70#discussion_r982699686 :

@@ -732,8 +734,8 @@ \subsection*{\texorpdfstring{\emph{Anopheles gambiae} (mosquito)}{Anopheles gamb \citet{zheng1996integrated} (Figure \ref{fig:anogam}A). As direct estimates of \textbf{mutation rate} (e.g., via mutation accumulation) do not currently exist for \emph{Anopheles gambiae}, we used the genome-wide average mutation rate of $\mu=3.5 \times 10^{-9}$ mutations per generation per site, -estimated by \cite{Keightley2009} for the closely related species, \textit{Drosophila melanogaster}, -and used for analysis of \textit{A.~gambiae} data in \citet{Miles2017}. +estimated by \cite{Keightley2009} for the fellow Dipteran \textit{Drosophila melanogaster},

'closely related' is always very vague. We added this here just to signal that we chose D. mel because of its close relation to A. gam. "fellow Dipteran" is a better choice.

— Reply to this email directly, view it on GitHub https://github.com/popsim-consortium/adding-species-manuscript/pull/70#pullrequestreview-1124077976, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTKUNLCVSDDUMXESBZFNW3WASEDZANCNFSM6AAAAAAQXMV7QY . You are receiving this because you were mentioned.Message ID: <popsim-consortium/adding-species-manuscript/pull/70/review/1124077976@ github.com>

petrelharp commented 2 years ago

OK, I've dealt with the comments, and I think this is ready to merge?