Open wflynny opened 6 years ago
Dear @wflynny, sorry for the late response. And sorry that I don't feel able to comment on imputation techniques, it's simply something that I don't have a lot of experience with.
One comprehensive benchmark is this one by Zhang et al (not so up-to-date anymore, though). It'd be nice to establish a "live" benchmark repository and compare all methods in a transparent, comprehensive and up-to-date way.
@gokceneraslan Yes, I agree a transparent benchmark repo would be very valuable.
I'd also like to see a detailed breakdown of the limitations of each method or imputation in general. It seems problematic to me to use imputed data for all downstream analyses, for example sub-clustering or DGE analysis, but I can't find a discussion of those limitations anywhere. I'm a little wary of imputation methods being part of a standard toolkit without sufficient discussion of limitations in the documentation somewhere.
Dear @wflynny. You're completely right, I added to the documentation a note that the whole topic is under debate (here).
Generally, Scanpy aims to enable access to different tools via the same data object and consistent interfaces so that users can conveniently try out different tools. The threshold for including an interface in Scanpy is low and only requires that a preprint/paper together with a solid GitHub repository exist.
I think this is an important conversation to have not just for imputation, but also for other analysis methods like visualization and batch effect correction. Every algorithm makes some assumptions and biases, and it is possible to misinterpret for misuse almost any machine learning algorithm.
For example, t-SNE, often used for visualization, is also used as dimensionality reduction for clustering. However, most clustering algorithms assume that global distances in a dataset are relevant. This assumption is broken with t-SNE, as evidenced by the inconsistency of t-SNE embeddings on the same data and inability for t-SNE to capture some global trends in a dataset (especially with continuous data, leading to the popularity of graph-based visualizations).
On top of this, each clustering algorithm makes assumptions that data is in fact distributed in clusters, but this is often not the case in single cell data.
I agree that it's important to warn users about the limitations of imputation methods, and make them aware that their decision on which algorithm to run can affect their output. However, it seems to me that this conversation could be much broader in scope. We don't currently have a system for unified benchmarking and standardization of single cell analysis methods, so all approaches should be used with some caution.
@dburkhardt Sorry for the late response to this.
I agree that the space of single cell 'omics analysis tools is essentially the wild west, where every tool should be viewed critically. However, I'm wary of abandoning a critical discussion of imputation methods in this space because other portions of the typical workflow have issues as well. Further, I think there are important distinctions to be made between different classes of methodology that are (mis)used in this problem space.
I. Methods that are fundamentally flawed by their assumptions or algorithm. These should obviously be avoided. II. Methods that are fundamentally sound but are not sufficiently validated, e.g. the validation doesn't exist in this problem space, isn't sufficiently comprehensive/relevant, performs poorly against other fundamentally sound methodologies, or has such restrictive assumptions it isn't broadly useful/applicable. III. Methods that are fundamentally sound in assumption/algorithm and can be used by a competent practitioner but still have the potential to be abused through applying it to data that violate those assumptions.
I'd consider t-SNE and a great deal of the clustering algorithms to be in class III for the reasons you said; they're valid, functional tools but can be applied in assumption-violating or quasi-valid ways. I'm pretty sure that scImpute, for example, belongs in class I because its description of dropout and simulated test cases are inappropriate. I'd put MAGIC and several other currently available imputation methods in class II as they've got strong foundations but currently insufficient validation IMO.
I'm not trying to pick on MAGIC or any specific imputation method. Instead I'd like to have an open discussion about the benefits, limitations, and relative performance of the various imputation methods available with the goal leading to something like @gokceneraslan suggested.
Well, and since you brought it up, batch correction and multimodal integration methods are in definite need of the same open discussion, which I'd be happy to have, and I think they should have the same disclaimer regarding their limitations in the documentation.
@falexwolf, @flying-sheep
From the discussion on #45, I think some more discussion should be had as to what imputation methods are to be included in scanpy. Validation of and comparisons between the currently available imputation methods are both severely lacking---I only know of [1][2][3][4][5], none of which include comprehensive benchmarks, and the updated MAGIC (#187) article at Cell doesn't include relevant comparisons between current methods.
I'd be very interested in hearing/having an open discussion about the motivation, benefits, and limitations of the various imputation methods available.
[1]: Zhang and Zhang, 2017. https://www.biorxiv.org/content/early/2017/12/31/241190 [2]: Lopez et al. 2018, https://www.biorxiv.org/content/early/2018/03/30/292037 [3]: Li and Li, 2018. https://www.nature.com/articles/s41467-018-03405-7 [4]: Eraslan et al. 2018. https://www.biorxiv.org/content/early/2018/04/13/300681 [5]: Huang et al. 2018. https://www.biorxiv.org/content/early/2018/03/08/138677