map_new_data for scATAC-Seq data

liaojinyue commented 4 years ago

Hi, I used STREAM for scATAC-Seq analysis and the results are good for the individual datasets. I then tried to use map_new_data function but it produced the following error, while it works for the demo scRNA-Seq dataset. Does this function support scATAC-Seq analysis?

Best,

Jason

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-107-f62892c07c52> in <module>
----> 1 st.map_new_data(adata,adata_emt2,feature =all, method="mlle",use_radius=False)

~/anaconda3/envs/st.atac.env/lib/python3.7/site-packages/stream/core.py in map_new_data(adata, adata_new, feature, method, use_radius)
   4846             adata_new.obsm['X_mlle_mapping'] = new_X_mlle
   4847         else:
-> 4848             adata_new.obsm['X_mlle_mapping'] = trans.transform(input_data)
   4849         adata_new.obsm['X_dr'] = adata_new.obsm['X_mlle_mapping'].copy()
   4850     if(method == 'umap'):

UnboundLocalError: local variable 'input_data' referenced before assignment

huidongchen commented 4 years ago

Hi Jason,

Sorry about the delay. map_new_data should work for scATAC-seq as well.

Since in ATAC-seq analysis, there are no gene selection step. Instead of using the default parameters, you need to specify feature='top_pcs' within st.map_new_data(). This will use the same top PCs used in reference data to map new cells.

liaojinyue commented 4 years ago

Hi, Thanks for your reply. I tried st.map_new_data(adata,adata_emt,feature='top_pcs',method='umap') and got the same error.

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-13-2f6e91482eac> in <module>
----> 1 st.map_new_data(adata,adata_emt,feature='top_pcs',method='umap')

~/anaconda3/envs/st.atac.env/lib/python3.7/site-packages/stream/core.py in map_new_data(adata, adata_new, feature, method, use_radius)
   4850     if(method == 'umap'):
   4851         trans = adata.uns['trans_umap']
-> 4852         adata_new.obsm['X_umap_mapping'] = trans.transform(input_data)
   4853         adata_new.obsm['X_dr'] = adata_new.obsm['X_umap_mapping'].copy()
   4854     if(method == 'pca'):

UnboundLocalError: local variable 'input_data' referenced before assignment

My objects are: Adata

AnnData object with n_obs × n_vars = 2771 × 8192 
    obs: 'label', 'label_color', 'node', 'branch_id', 'branch_id_alias', 'branch_lam', 'branch_dist', 'S0_pseudotime', 'S3_pseudotime', 'S2_pseudotime', 'S1_pseudotime'
    uns: 'workdir', 'experiment', 'label_color', 'pca_variance_ratio', 'trans_se', 'epg', 'flat_tree', 'seed_epg', 'seed_flat_tree', 'ori_epg', 'epg_obj', 'ori_epg_obj', 'subwaymap_S0', 'trans_pca', 'trans_umap', 'trans_mlle'
    obsm: 'pca', 'top_pcs', 'X_se', 'X_dr', 'X_spring', 'X_subwaymap_S0', 'X_pca', 'X_umap', 'X_mlle'
    layers: 'zscores', 'zscores_scaled'

adata_emt

AnnData object with n_obs × n_vars = 2555 × 8192 
    obs: 'label', 'label_color', 'node', 'branch_id', 'branch_id_alias', 'branch_lam', 'branch_dist', 'S0_pseudotime', 'S1_pseudotime'
    uns: 'workdir', 'experiment', 'label_color', 'pca_variance_ratio', 'trans_se', 'epg', 'flat_tree', 'seed_epg', 'seed_flat_tree', 'ori_epg', 'epg_obj', 'ori_epg_obj', 'subwaymap_S0'
    obsm: 'pca', 'top_pcs', 'X_se', 'X_dr', 'X_spring', 'X_subwaymap_S0'
    layers: 'zscores', 'zscores_scaled'

huidongchen commented 4 years ago

Which version are you using? The support for feature top_pcs is recently added in the latest v0.4.0 . If you are using v0.3.9, you might need to upgrade it.

huidongchen commented 4 years ago

My bad.. just realized that I only updated github repo but haven't updated it in bioconda package yet. I guess for now the fastest way to get around it is to sync with STREM github.

You can git clone repo and python setup.py install locally.

I will try to update bioconda package in the next few days.

liaojinyue commented 4 years ago

Hi, I tried to install locally but not sure how to import the newer version in jupyter notebook. I think I'll wait for the updated bioconda package.

Jason

huidongchen commented 4 years ago

Hi Jason,

The new version 0.4.1 is available now. It supports the mapping using top_pcs.Let me know if it works for you.

pinellolab / STREAM

map_new_data for scATAC-Seq data #53