pinellolab / STREAM

STREAM: Single-cell Trajectories Reconstruction, Exploration And Mapping of single-cell data
http://stream.pinellolab.org
GNU Affero General Public License v3.0
173 stars 48 forks source link

dimension_reduction() gives UnboundLocalError #94

Closed mcsimenc closed 4 years ago

mcsimenc commented 4 years ago

I'm analyzing an expression matrix containing I exported from Seurat after doing QC and normalization there. It has only the 3000 most variable genes. I ran

st.select_top_principal_components(adata, n_pc = 18, first_pc = True)

followed by

st.dimension_reduction(adata, method = "umap", feature = "all", n_components = 2, n_neighbors = 15, n_jobs = 4)

the latter of which gives the following error:

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-30-07dc3061eb1c> in <module>
----> 1 st.dimension_reduction(adata, method = "umap", feature = "all", n_components = 2, n_neighbors = 15, n_jobs = 4)

~/.conda/envs/stream/lib/python3.7/site-packages/stream/core.py in dimension_reduction(adata, n_neighbors, nb_pct, n_components, n_jobs, feature, method, eigen_solver)
   1183     if(method == 'umap'):
   1184         reducer = umap.UMAP(n_neighbors=n_neighbors,n_components=n_components,random_state=42)
-> 1185         trans = reducer.fit(input_data)
   1186         adata.uns['trans_umap'] = trans
   1187         adata.obsm['X_umap'] = trans.embedding_

UnboundLocalError: local variable 'input_data' referenced before assignment

I tried using se, pca, mlle as the method and also received an error. Any ideas?

Thanks!

Matt

huidongchen commented 4 years ago

Hi Matt,

Thanks for the feedback. This is indeed a bug in the current version and it has been fixed in our Github repo fix bugs (feature name: all_genes -> all) . It will be updated in our next release.

For now, can your simply paste the following function:

def dimension_reduction(adata,n_neighbors=50, nb_pct = None,n_components = 3,n_jobs = 1,
                        feature='var_genes',method = 'se',eigen_solver=None):

    """Perform dimension reduction.

    Parameters
    ----------
    adata: AnnData
        Annotated data matrix.
    n_neighbors: `int`, optional (default: 50)
        The number of neighbor cells used for manifold learning (only valid when 'mlle','se', or 'umap' is specified).
    nb_pct: `float`, optional (default: None)
        The percentage of neighbor cells (when sepcified, it will overwrite n_neighbors).
    n_components: `int`, optional (default: 3)
        Number of components to keep.
    n_jobs: `int`, optional (default: 1)
        The number of parallel jobs to run.
    feature: `str`, optional (default: 'var_genes')
        Choose from {{'var_genes','top_pcs','all'}}
        Feature used for dimension reduction.
        'var_genes': most variable genes
        'top_pcs': top principal components
        'all': all available features (genes)
    method: `str`, optional (default: 'se')
        Choose from {{'se','mlle','umap','pca'}}
        Method used for dimension reduction.
        'se': Spectral embedding algorithm
        'mlle': Modified locally linear embedding algorithm
        'umap': Uniform Manifold Approximation and Projection
        'pca': Principal component analysis
    eigen_solver: `str`, optional (default: None)
        For 'mlle', choose from {{'arpack', 'dense'}}
        For 'se', choose from {{'arpack', 'lobpcg', or 'amg'}}
        The eigenvalue decomposition strategy to use

    Returns
    -------
    updates `adata` with the following fields.

    X_dr : `numpy.ndarray` (`adata.obsm['X_dr']`)
        A #observations × n_components data matrix after dimension reduction.
    X_mlle : `numpy.ndarray` (`adata.obsm['X_mlle']`)
        Store #observations × n_components data matrix after mlle.
    X_se : `numpy.ndarray` (`adata.obsm['X_se']`)
        Store #observations × n_components data matrix after spectral embedding.    
    X_umap : `numpy.ndarray` (`adata.obsm['X_umap']`)
        Store #observations × n_components data matrix after umap.
    X_pca : `numpy.ndarray` (`adata.obsm['X_pca']`)
        Store #observations × n_components data matrix after pca.
    trans_mlle : `sklearn.manifold.locally_linear.LocallyLinearEmbedding` (`adata.uns['trans_mlle']`)
        Store mlle object
    trans_se : `sklearn.manifold.spectral_embedding_.SpectralEmbedding` (`adata.uns['trans_se']`)
        Store se object
    trans_umap : `umap.UMAP` (`adata.uns['trans_umap']`)
        Store umap object
    trans_pca : `sklearn.decomposition.PCA` (`adata.uns['trans_pca']`)
        Store pca object 
    """

    if(feature not in ['var_genes','top_pcs','all']):
        raise ValueError("unrecognized feature '%s'" % feature)
    if(method not in ['mlle','se','umap','pca']):
        raise ValueError("unrecognized method '%s'" % method)
    if(feature == 'var_genes'):
        input_data = adata.obsm['var_genes']
    if(feature == 'top_pcs'):
        input_data = adata.obsm['top_pcs']
    if(feature == 'all'):
        input_data = adata.X
    print('feature ' + feature + ' is being used ...')
    print(str(n_jobs)+' cpus are being used ...')
    if(nb_pct!=None):
        n_neighbors = int(np.around(input_data.shape[0]*nb_pct))

    if(method == 'mlle'):
        np.random.seed(2)
        if(eigen_solver==None):
            if(input_data.shape[0]<=2000):
                reducer = LocallyLinearEmbedding(n_neighbors=n_neighbors, 
                                                     n_components=n_components,
                                                     n_jobs = n_jobs,
                                                     method = 'modified',eigen_solver = 'dense',random_state=10,
                                                     neighbors_algorithm = 'kd_tree')
            else:
                reducer = LocallyLinearEmbedding(n_neighbors=n_neighbors, 
                                                     n_components=n_components,
                                                     n_jobs = n_jobs,
                                                     method = 'modified',eigen_solver = 'arpack',random_state=10,
                                                     neighbors_algorithm = 'kd_tree')

        else:
            reducer = LocallyLinearEmbedding(n_neighbors=n_neighbors, 
                                                 n_components=n_components,
                                                 n_jobs = n_jobs,
                                                 method = 'modified',eigen_solver = eigen_solver,random_state=10,
                                                 neighbors_algorithm = 'kd_tree')        
        trans = reducer.fit(input_data)
        adata.uns['trans_mlle'] = trans
        adata.obsm['X_mlle'] = trans.embedding_
        adata.obsm['X_dr'] = trans.embedding_
    if(method == 'se'):
        np.random.seed(2)
        reducer = SpectralEmbedding(n_neighbors=n_neighbors, 
                                         n_components=n_components,
                                         n_jobs = n_jobs,
                                         eigen_solver = eigen_solver,random_state=10)
        trans = reducer.fit(input_data)
        adata.uns['trans_se'] = trans
        adata.obsm['X_se'] = trans.embedding_
        adata.obsm['X_dr'] = trans.embedding_
    if(method == 'umap'):
        reducer = umap.UMAP(n_neighbors=n_neighbors,n_components=n_components,random_state=42)
        trans = reducer.fit(input_data)
        adata.uns['trans_umap'] = trans
        adata.obsm['X_umap'] = trans.embedding_
        adata.obsm['X_dr'] = trans.embedding_
    if(method == 'pca'):
        reducer = sklearnPCA(n_components=n_components,svd_solver='arpack',random_state=42)
        trans = reducer.fit(input_data)
        adata.uns['trans_pca'] = trans
        adata.obsm['X_pca'] = trans.transform(input_data) 
        adata.obsm['X_dr'] = adata.obsm['X_pca']
    if('params' not in adata.uns_keys()):
        adata.uns['params'] = dict()
    adata.uns['params']['dimension_reduction'] = {'feature':feature,'method':method,'n_components':n_components}
    return None

and run dimension_reduction(adata, method = "umap", feature = "all", n_components = 2, n_neighbors = 15, n_jobs = 4) ? (instead of st.dimension_reduction(adata, method = "umap", feature = "all", n_components = 2, n_neighbors = 15, n_jobs = 4))

mcsimenc commented 4 years ago

Great, the new function works for me