Open chris-rands opened 4 years ago
Could you tell us about how you've found it useful, or point us towards some literature on it being used? I think this would be easy enough to implement, but when I tried a naive implementation the results weren't that compelling. It's very possible I should've played around with the parameters more. @flying-sheep, do you have thoughts on this?
Here's a simple implementation:
def ica(adata, n_components, inplace=True, **kwargs):
from sklearn.decomposition import FastICA
ica_transformer = FastICA(n_components=n_components, **kwargs)
x_ica = ica_transformer.fit_transform(adata.X)
if inplace:
adata.obsm["X_ica"] = x_ica
adata.varm["ICs"] = ica_transformer.components_.T
else:
return ica_transformer
Hi ! To answer about how it is useful, we are using ICA in our lab to a dataset of more than 100k cells, with a lot of complexity, and the main advantage of ICA against PCA is that it helps us detecting small populations of cells. As these small populations are not accounting for a lot of variance within the dataset, using a treshold on PCs, we discarded the PCs that would allow the separate them. Another advantage is that we do not make use of a selection of "highly variable genes" anymore, and use all genes expressed in more than 100 cells for the whole analysis... Doing the same and applying PCA gave us quite poor results..
We made use of Seurat implementation.. and I tried fastICA from sklearn once but I couldn't obtain similar results... I have not looked thoroughly into seurat's code tough...
Hope it helps ! Best
Thank you both. In terms of literature, see this paper (PMID: 30096299), in particular Figure S2
I looked through the R fastica
package, and I think one of the main differences was the default tolerance. I've changed that and results seem a bit better. I don't think I have a great reference point to evaluate it though. Any chance you could provide a vignette of ICA being used with single cell data, so we can see if the python version can recapitulate the results?
@chris-rands Any update on this? Does the code from @ivirshup gives you any meaningful results?
BTW there is a nice "Faster ICA" algorithm now, called Picard. It's implemented as a python package here:
https://pierreablin.github.io/picard/generated/picard.picard.html#picard.picard
It might be worthwhile to try this too.
@gokceneraslan, got around to giving picard
a shot. For the pbmc3k dataset, it's a little slower than sklearns FastICA, though sklearn gives non-convergence warnings more often. Have you tried it before?
Thanks for the feedback and sorry for the delay. The code snippet using sklearn.decomposition.FastICA
gave me quite similar results to PCA for my data in terms of the downstream UMAP visualisations (when I simply embedded 50 components in the neighbourhood graph). One difference is that the ICA was slower to compute. I am not confident to create a vignette, as I'm unclear what the 'correct' results should look like. I have not tried the picard
implementation yet.
Hi, I tried the snippet, with fastICA and picard, and with a number of cells higher than 30,000, the whitening step cannot be completed. This seems be due to some Lapack limitations.
ValueError: Too large work array required -- computation cannot be performed with standard 32-bit LAPACK.
I don't know how to get around this....
Best,
Chloé
Sorry for the late response, completley forgot to post my response here.
@Fougere87, did that whitening issue occur with picard as well? I saw that with sklearn. I think we could get around that by whitening ourselves with ARPACK.
Picard and sklearn look pretty similar to me in a quick comparison. Below are top 16/30 components (ranked by Geary's C, autocorrelation on the connectivity graph) cell loadings on the pbmc3k dataset. The umap and connectivity matrix here were computed on top of a PCA – which I should maybe do differently. However I think the results are similar enough that it's probably not of consequence.
Are there plans to include independent component analysis (ICA) in Scanpy? This would be useful for me. I know that it is already implemented in Seurat. Thank you