Simulation code modifications

lazappi commented 2 years ago

Hi

I have just done a journal club on your paper and we all agreed it was really interesting and useful work. I was interested in how you did the simulations. From the paper it seems like you used {splatter} but after looking at the code I was surprised to find you actually use a modified version which I found here https://github.com/jordansquair/splatter_batch.

Could you please describe what modifications you made and why you found them useful? If it seems like they might have more general uses we might also discuss if they could be contributed back to the main package.

Thanks!

skinnider commented 2 years ago

Hi @lazappi, We actually created the modified version of splatter for our previous paper: https://www.nature.com/articles/s41587-020-0605-1. The motivation was to explore a particular type of batch effects confounding that didn't seem to be possible to simulate in Splatter as written, at the time (this was back in 2020). We specifically wanted to evaluate how batch effects affected the performance of our Augur package for cell type prioritization, but we were having trouble simulating a batch effect that would actually affect cell type prioritization. We ultimately figured out this was because of the interaction between batch effects and DE. The rationale in the Methods was:

Finally, because separability within cell types can arise not only from the cell-intrinsic response to perturbation but also from a number of technical factors, we evaluated the impact of batch effects on cell type prioritization (Extended Data Fig. 10). In simulated populations of 200 cells from 2 experimental conditions, sequenced in 2 batches, we simultaneously varied both the proportion of DE genes and the location parameter for the batch effect factor log-normal distribution (‘batch.facLoc’), fixing the location parameter of the DE factor log-normal distribution (‘de.facLoc’) at 0.5, as above. Under the default model in Splatter, technical batch effects are orthogonal to both the magnitude of perturbation-dependent DE, and the likelihood that a given cell is observed in either the stimulated or unstimulated condition. As the separability between conditions is effectively unchanged in this scenario (‘scenario 1’), we extended the Splatter package to incorporate confounding between batch and DE (‘scenario 2’), and between batch and experimental condition (‘scenarios 3–5’). Confounding between batch and DE is achieved by adjusting the order of operations in Splatter such that DE is simulated before the application of a batch effect, with the result that the batch effect amplifies the perturbation in one of the two batches. Confounding between batch and condition is achieved by adjusting the proportion of cells from each experimental condition within each batch, such that one batch is more likely to contain cells from the stimulated population. The fork of the Splatter repository implementing confounded batch effects is available from https://github.com/jordansquair/splatter_batch.

Here is some additional discussion of this experiment that we provided in the response to reviewers that may provide some additional clarification:

In our initial analysis, we were surprised to discover that Augur cell type prioritizations are surprisingly robust to a number of different batch effect scenarios. This result compelled us to devise a new simulation framework to challenge Augur in various types of scenarios. For this purpose, we extended the Splatter package to simulate scenarios in which cell type prioritization is degraded by a batch effect. Specifically, under the default batch effect simulation mode in Splatter (“scenario #1”), technical batch is independent of both the magnitude of perturbation-dependent differential expression, and the likelihood that a given cell is observed in either the stimulated or unstimulated condition. Under this scenario, cell type prioritization is entirely unaffected by the presence of a batch effect, since the ‘separability’ between conditions is effectively unchanged.

We therefore first adjusted the simulation framework to induce a stronger perturbation response in one of the two batches (that is, batch and differential expression are confounded; “scenario #2”). Arguably, this scenario reflects what we generally have in mind when we think of batch effects: for instance, the case in which a higher dose of the treatment was accidentally given to the second experimental replicate. Again, however, the ‘separability’ between conditions remains unchanged, and consequently, cell type prioritization is unaffected.

This finding pushed us to further explore whether batch effects could confound cell type prioritization. We therefore devised a method to introduce confounding between batch and condition, such that certain batches were more or less likely to contain cells from the unstimulated population. We introduced variable amounts of confounding between batch and condition (“scenarios #3-5”) and, remarkably, found that cell type prioritization was robust to substantial confounding (“scenarios #3 and #4”). However, when the batch was severely confounded with condition (i.e., cells in one batch are 80% more likely to be from the stimulated population; “scenario #5”), the AUC began to scale with both perturbation intensity and the magnitude of the batch effect.

Tagging @jordansquair who actually wrote the Splatter fork here in case there is anything I'm missing.

lazappi commented 2 years ago

Hi @skinnider

Thanks for the detailed explanation (and sorry for the slow reply!). I think I can see the use case for this and it's maybe something I should to incorporate into future versions of splatter (if I can find time to work on it properly again 😸 ).

neurorestore / DE-analysis

Simulation code modifications #2