Closed mcsimenc closed 4 years ago
Hi Matt,
Thanks for the feedback. This is indeed a bug in the current version and it has been fixed in our Github repo fix bugs (feature name: all_genes -> all) . It will be updated in our next release.
For now, can your simply paste the following function:
def dimension_reduction(adata,n_neighbors=50, nb_pct = None,n_components = 3,n_jobs = 1,
feature='var_genes',method = 'se',eigen_solver=None):
"""Perform dimension reduction.
Parameters
----------
adata: AnnData
Annotated data matrix.
n_neighbors: `int`, optional (default: 50)
The number of neighbor cells used for manifold learning (only valid when 'mlle','se', or 'umap' is specified).
nb_pct: `float`, optional (default: None)
The percentage of neighbor cells (when sepcified, it will overwrite n_neighbors).
n_components: `int`, optional (default: 3)
Number of components to keep.
n_jobs: `int`, optional (default: 1)
The number of parallel jobs to run.
feature: `str`, optional (default: 'var_genes')
Choose from {{'var_genes','top_pcs','all'}}
Feature used for dimension reduction.
'var_genes': most variable genes
'top_pcs': top principal components
'all': all available features (genes)
method: `str`, optional (default: 'se')
Choose from {{'se','mlle','umap','pca'}}
Method used for dimension reduction.
'se': Spectral embedding algorithm
'mlle': Modified locally linear embedding algorithm
'umap': Uniform Manifold Approximation and Projection
'pca': Principal component analysis
eigen_solver: `str`, optional (default: None)
For 'mlle', choose from {{'arpack', 'dense'}}
For 'se', choose from {{'arpack', 'lobpcg', or 'amg'}}
The eigenvalue decomposition strategy to use
Returns
-------
updates `adata` with the following fields.
X_dr : `numpy.ndarray` (`adata.obsm['X_dr']`)
A #observations × n_components data matrix after dimension reduction.
X_mlle : `numpy.ndarray` (`adata.obsm['X_mlle']`)
Store #observations × n_components data matrix after mlle.
X_se : `numpy.ndarray` (`adata.obsm['X_se']`)
Store #observations × n_components data matrix after spectral embedding.
X_umap : `numpy.ndarray` (`adata.obsm['X_umap']`)
Store #observations × n_components data matrix after umap.
X_pca : `numpy.ndarray` (`adata.obsm['X_pca']`)
Store #observations × n_components data matrix after pca.
trans_mlle : `sklearn.manifold.locally_linear.LocallyLinearEmbedding` (`adata.uns['trans_mlle']`)
Store mlle object
trans_se : `sklearn.manifold.spectral_embedding_.SpectralEmbedding` (`adata.uns['trans_se']`)
Store se object
trans_umap : `umap.UMAP` (`adata.uns['trans_umap']`)
Store umap object
trans_pca : `sklearn.decomposition.PCA` (`adata.uns['trans_pca']`)
Store pca object
"""
if(feature not in ['var_genes','top_pcs','all']):
raise ValueError("unrecognized feature '%s'" % feature)
if(method not in ['mlle','se','umap','pca']):
raise ValueError("unrecognized method '%s'" % method)
if(feature == 'var_genes'):
input_data = adata.obsm['var_genes']
if(feature == 'top_pcs'):
input_data = adata.obsm['top_pcs']
if(feature == 'all'):
input_data = adata.X
print('feature ' + feature + ' is being used ...')
print(str(n_jobs)+' cpus are being used ...')
if(nb_pct!=None):
n_neighbors = int(np.around(input_data.shape[0]*nb_pct))
if(method == 'mlle'):
np.random.seed(2)
if(eigen_solver==None):
if(input_data.shape[0]<=2000):
reducer = LocallyLinearEmbedding(n_neighbors=n_neighbors,
n_components=n_components,
n_jobs = n_jobs,
method = 'modified',eigen_solver = 'dense',random_state=10,
neighbors_algorithm = 'kd_tree')
else:
reducer = LocallyLinearEmbedding(n_neighbors=n_neighbors,
n_components=n_components,
n_jobs = n_jobs,
method = 'modified',eigen_solver = 'arpack',random_state=10,
neighbors_algorithm = 'kd_tree')
else:
reducer = LocallyLinearEmbedding(n_neighbors=n_neighbors,
n_components=n_components,
n_jobs = n_jobs,
method = 'modified',eigen_solver = eigen_solver,random_state=10,
neighbors_algorithm = 'kd_tree')
trans = reducer.fit(input_data)
adata.uns['trans_mlle'] = trans
adata.obsm['X_mlle'] = trans.embedding_
adata.obsm['X_dr'] = trans.embedding_
if(method == 'se'):
np.random.seed(2)
reducer = SpectralEmbedding(n_neighbors=n_neighbors,
n_components=n_components,
n_jobs = n_jobs,
eigen_solver = eigen_solver,random_state=10)
trans = reducer.fit(input_data)
adata.uns['trans_se'] = trans
adata.obsm['X_se'] = trans.embedding_
adata.obsm['X_dr'] = trans.embedding_
if(method == 'umap'):
reducer = umap.UMAP(n_neighbors=n_neighbors,n_components=n_components,random_state=42)
trans = reducer.fit(input_data)
adata.uns['trans_umap'] = trans
adata.obsm['X_umap'] = trans.embedding_
adata.obsm['X_dr'] = trans.embedding_
if(method == 'pca'):
reducer = sklearnPCA(n_components=n_components,svd_solver='arpack',random_state=42)
trans = reducer.fit(input_data)
adata.uns['trans_pca'] = trans
adata.obsm['X_pca'] = trans.transform(input_data)
adata.obsm['X_dr'] = adata.obsm['X_pca']
if('params' not in adata.uns_keys()):
adata.uns['params'] = dict()
adata.uns['params']['dimension_reduction'] = {'feature':feature,'method':method,'n_components':n_components}
return None
and run dimension_reduction(adata, method = "umap", feature = "all", n_components = 2, n_neighbors = 15, n_jobs = 4)
? (instead of st.dimension_reduction(adata, method = "umap", feature = "all", n_components = 2, n_neighbors = 15, n_jobs = 4)
)
Great, the new function works for me
I'm analyzing an expression matrix containing I exported from Seurat after doing QC and normalization there. It has only the 3000 most variable genes. I ran
st.select_top_principal_components(adata, n_pc = 18, first_pc = True)
followed by
st.dimension_reduction(adata, method = "umap", feature = "all", n_components = 2, n_neighbors = 15, n_jobs = 4)
the latter of which gives the following error:
I tried using
se
,pca
,mlle
as the method and also received an error. Any ideas?Thanks!
Matt