theislab / scvelo

RNA Velocity generalized through dynamical modeling
https://scvelo.org
BSD 3-Clause "New" or "Revised" License
414 stars 102 forks source link

scVelo doesn't select the n_top_genes when doing scv.pp.filter_and_normalize() #775

Closed hyjforesight closed 2 years ago

hyjforesight commented 2 years ago

... Hello scVelo, My dataset has 5000 genes, and I set n_top_genes=2000 to do scv.pp.filter_and_normalize(). But it only filter out the genes with min_shared_counts and doesnt't select the 2000 top genes. I still get more than 2000 genes for the downstream analysis. Please see the picture. image Could you please help me with this issue? Thanks! Best, YJ

# paste your code here, if applicable
Error output ```pytb # paste the error output here, if applicable ```
Versions ```pytb # paste the ouput of scv.logging.print_versions() here ```scanpy==1.8.2 anndata==0.7.8 umap==0.5.2 numpy==1.20.3 scipy==1.7.2 pandas==1.3.4 scikit-learn==1.0.1 statsmodels==0.13.1 python-igraph==0.9.8 pynndescent==0.5.5 scvelo==0.2.4 scanpy==1.8.2 anndata==0.7.8 loompy==3.0.6 numpy==1.20.3 scipy==1.7.2 matplotlib==3.5.0 sklearn==1.0.1 pandas==1.3.4 cellrank==1.5.0 scanpy==1.8.2 anndata==0.7.8 numpy==1.20.3 numba==0.54.1 scipy==1.7.2 pandas==1.3.4 pygpcca==1.0.2 scikit-learn==1.0.1 statsmodels==0.13.1 python-igraph==0.9.8 scvelo==0.2.4 pygam==0.8.0 matplotlib==3.5.0 seaborn==0.11.2
WeilerP commented 2 years ago

@hyjforesight, I suppose this is caused by multiple cells with the same value in adata.var['dispersion_norm']. The number of duplicated entries is given by adata.var['dispersion_norm'].duplicated().sum(). Would you mind checking how many duplicate entries there are?

Also, a quick note: In general, the data in adata.X should not be normalized when passing to scv.pp.filter_and_normalize. I suggest rerunning the pipeline with the raw counts in adata.X.

hyjforesight commented 2 years ago

Hello @WeilerP , Thanks for the response. There is 0 duplicate. Please check the below coding and outputs.

adata = sc.read('C:/Users/Park_Lab/Documents/adata.h5ad')
adata
AnnData object with n_obs × n_vars = 2636 × 5000
    obs: 'leiden', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_rpl', 'pct_counts_rpl', 'total_counts_rps', 'pct_counts_rps'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'n_cells', 'mt', 'rpl', 'rps', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
    uns: 'hvg', 'leiden', 'leiden_colors', 'neighbors', 'pca', 'rank_genes_groups', 'umap'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced'
    obsp: 'connectivities', 'distances'
adata.var['dispersions_norm'].duplicated().sum()
0
scv.pl.proportions(adata, groupby='leiden')
scv.pp.filter_and_normalize(adata, min_shared_counts=30, n_top_genes=2000)
adata
Filtered out 1562 genes that are detected 30 counts (shared).
WARNING: Did not normalize X as it looks processed already. To enforce normalization, set `enforce=True`.
Normalized count data: spliced, unspliced.
Extracted 2845 highly variable genes.
WARNING: Did not modify X as it looks preprocessed already.
AnnData object with n_obs × n_vars = 2636 × 2845
    obs: 'leiden', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_rpl', 'pct_counts_rpl', 'total_counts_rps', 'pct_counts_rps', 'initial_size_spliced', 'initial_size_unspliced', 'initial_size', 'n_counts'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'n_cells', 'mt', 'rpl', 'rps', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
    uns: 'hvg', 'leiden', 'leiden_colors', 'neighbors', 'pca', 'rank_genes_groups', 'umap'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced'
    obsp: 'connectivities', 'distances'

For the adata.X issue, our data was UMAPed by Scanpy and saved as h5ad by adata.write('C:/Users/Park_Lab/Documents/adata.h5ad', compression='gzip'). Because we want to show the velocity stream map on the same UMAP embedding of Scanpy, we input this h5ad file into scVelo and run the scVelo pipeline. So the adata.X was normalized by Scanpy. As you recommended to use the raw counts, we save the Scanpy data by adata.raw.to_adata().write('C:/Users/Park_Lab/Documents/adata_raw.h5ad', compression='gzip'). But the adata.raw doesn't store the 'unspliced' and 'spliced' layers. adata_raw.h5ad cannot be proceeded to the scVelo pipeline.

For using the same embedding between Scanpy and scVelo, we need the adata.ubsm['X_umap'] and adata.obs['leiden'], which should be generated by Scanpy after lognormalization, HVGs and scaling. In this case, how shall we use the raw adata.X because it has been normalized by Scanpy? I tried to load the original loom file and manually copy 'X_umap' to this loom file by adata.ubsm['X_umap']=adata_scanpy_proceeded.ubs['X_umap'], but it failed, generating errors 'Value passed for key 'X_umap' is of incorrect shape. Values of obsm must match dimensions (0,) of parent. Value had shape (10681, 2) while it should have had (10682,)'. Do you have any suggestions about how to do the best scv.pp.filter_and_normalize() and scv.pp.moments() before running the scv.tl.velocity()? Or, in which step shall we save the data in Scanpy pipeline for scVelo processing?

The Scanpy coding is as below:

sc.pl.highest_expr_genes(adata, n_top=20)
sc.pp.filter_cells(adata, min_genes=100)
sc.pp.filter_genes(adata, min_cells=25)
adata.var['mt'] = adata.var_names.str.startswith('mt-')
adata.var['rpl'] = adata.var_names.str.startswith('Rpl')
adata.var['rps'] = adata.var_names.str.startswith('Rps')
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt','rpl','rps'], percent_top=None, log1p=False, inplace=True)
sc.pl.violin(adata, keys=['n_genes_by_counts', 'total_counts', 'pct_counts_mt','pct_counts_rpl','pct_counts_rps'], jitter=0.4, multi_panel=True)
sc.pl.scatter(adata, x='total_counts', y='pct_counts_mt')
sc.pl.scatter(adata, x='total_counts', y='pct_counts_rpl')
sc.pl.scatter(adata, x='total_counts', y='pct_counts_rps')
sc.pl.scatter(adata, x='total_counts', y='n_genes_by_counts')
adata = adata[adata.obs.n_genes_by_counts < 6000, :]
adata = adata[adata.obs.pct_counts_mt < 50, :]
adata = adata[adata.obs.pct_counts_rpl < 50, :]
adata = adata[adata.obs.pct_counts_rps < 50, :]
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=5000)
sc.pl.highly_variable_genes(adata)
print(sum(adata.var.highly_variable))
adata.raw=adata
adata = adata[:, adata.var.highly_variable]
sc.pp.regress_out(adata, keys=['pct_counts_mt','pct_counts_rpl','pct_counts_rps'], n_jobs=16)
sc.pp.scale(adata, max_value=10)
sc.tl.pca(adata, svd_solver='arpack')
sc.pp.neighbors(adata, n_pcs=50, knn=True)
sc.tl.leiden(adata, resolution=0.3)
sc.tl.umap(adata)
sc.pl.umap(adata, color=['leiden'], legend_loc='on data', frameon=False, title='', use_raw=False)
sc.tl.rank_genes_groups(adata, groupby='leiden', method='wilcoxon', use_raw=False)
result = adata.uns['rank_genes_groups']
groups = result['names'].dtype.names
pd.DataFrame(
    {group + '_' + key[:3]: result[key][group]
for group in groups for key in ['names', 'scores', 'logfoldchanges', pvals_adj']}).head(5000).to_csv("C:/Users/Park_Lab/Documents/adata.csv")
new_cluster_names = ['…']
adata.rename_categories('leiden', new_cluster_names)
adata.write('C:/Users/Park_Lab/Documents/adata.h5ad', compression='gzip')

Thanks a lot! Best, YJ

WeilerP commented 2 years ago

@hyjforesight, would you please run the standard scvelo preprocessing pipeline, i.e., with raw counts in .X, layers['unspliced'] and layers['spliced'], to double check that the problem is occurring in this case, as well?

About the preprocessing: The problem with your approach is that the data used to calculate the neighbor graph and the data used to infer RNA velocity are not the same. This could lead to all kinds of unexpected results, and I am not sure how you'd interpret a velocity stream, for example, plotted on a 2D UMAP embedding generated from different data.

hyjforesight commented 2 years ago

Hello @WeilerP , Thanks for the response. We used Scanpy, scVelo, CellRank developed by your team. We appreciate it and we'll make the acknowledgment and cite your papers once we publish our data. As you recommended, I run my raw data (Scanpy unproceeded) by the standard scVelo pipeline. The 2000 n_top_genes can be rightly extracted. It looks like the BUG is caused by Scanpy proceeding the data. How do we fix it? Moreover, we met the KeyError: 'X_umap'. Because we use the raw data, there's no information of 'X_umap', how could we draw the velocity_embedding_stream?

adata = sc.read_loom(filename='C:/Users/Park_Lab/Documents/sample.loom')
adata.var_names_make_unique()
adata
AnnData object with n_obs × n_vars = 13499 × 32285
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    layers: 'matrix', 'ambiguous', 'spliced', 'unspliced'
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
Filtered out 21387 genes that are detected 20 counts (shared).
Normalized count data: X, spliced, unspliced.
Extracted 2000 highly variable genes.
Logarithmized X.
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
computing PCA
    on highly variable genes
    with n_comps=30
    finished (0:00:00)
computing neighbors
    finished (0:00:01) --> added 
    'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
    finished (0:00:00) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
scv.tl.recover_dynamics(adata, n_jobs=16)
scv.tl.velocity(adata, mode='dynamical')
scv.tl.velocity_graph(adata, n_jobs=16)
adata
recovering dynamics (using 16/16 cores)
100%
1219/1219 [04:55<00:00, 2.18gene/s]
finished (0:05:02) --> added 
    'fit_pars', fitted parameters for splicing dynamics (adata.var)
computing velocities
    finished (0:00:15) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph (using 16/16 cores)
100%
13499/13499 [00:09<00:00, 429.80cells/s]
 finished (0:00:12) --> added 
    'velocity_graph', sparse matrix with cosine correlations (adata.uns)
AnnData object with n_obs × n_vars = 13499 × 2000
    obs: 'initial_size_spliced', 'initial_size_unspliced', 'initial_size', 'n_counts', 'velocity_self_transition'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'highly_variable', 'fit_r2', 'fit_alpha', 'fit_beta', 'fit_gamma', 'fit_t_', 'fit_scaling', 'fit_std_u', 'fit_std_s', 'fit_likelihood', 'fit_u0', 'fit_s0', 'fit_pval_steady', 'fit_steady_u', 'fit_steady_s', 'fit_variance', 'fit_alignment_scaling', 'velocity_genes'
    uns: 'pca', 'neighbors', 'recover_dynamics', 'velocity_params', 'velocity_graph', 'velocity_graph_neg'
    obsm: 'X_pca'
    varm: 'PCs', 'loss'
    layers: 'matrix', 'ambiguous', 'spliced', 'unspliced', 'Ms', 'Mu', 'fit_t', 'fit_tau', 'fit_tau_', 'velocity', 'velocity_u'
    obsp: 'distances', 'connectivities'
scv.pl.velocity_embedding_stream(adata, basis='umap')
KeyError: 'X_umap'

About the preprocessing, thanks for the information. We understand that scv.pp.moments(adata, n_pcs=30, n_neighbors=30) will redo the PCA and neighboring, generating results different from Scanpy's sc.tl.pca(adata, svd_solver='arpack') and sc.pp.neighbors(adata, n_pcs=50, knn=True). We're wondering whether it is possible to draw the velocity stream right on the same UMAP of Scanpy, like below. These 2 plots are generated by directly inputing Scanpy proceeded data into scVelo, which is not right because .X was normalized by Scanpy instead of scVelo. image We did below coding to copy our existing UMAP (generated by Scanpy) to the scVelo proceeding data, trying to draw the velocity stream right on the same UMAP of Scanpy, but it generates errors 'ValueError: Lengths must match to compare` .

ACT_sub2 = sc.read('C:/Users/Park_Lab/Documents/ACT_sub2.h5ad')    # Scanpy proceeded data
ACT_sub2
AnnData object with n_obs × n_vars = 2636 × 5000
    obs: 'leiden', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_rpl', 'pct_counts_rpl', 'total_counts_rps', 'pct_counts_rps'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'n_cells', 'mt', 'rpl', 'rps', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
    uns: 'hvg', 'leiden', 'leiden_colors', 'neighbors', 'pca', 'rank_genes_groups', 'umap'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced'
    obsp: 'connectivities', 'distances'
raw = sc.read_loom(filename='C:/Users/Park_Lab/Documents/sample.loom')    # raw data
raw.var_names_make_unique()
raw
AnnData object with n_obs × n_vars = 13499 × 32285
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    layers: 'matrix', 'ambiguous', 'spliced', 'unspliced'
raw.obs['leiden']=ACT_sub2.obs['leiden']
adata=raw[raw.obs['leiden'].isin(['0', '1', '2', '3', '4'])]
adata.obsm['X_umap'] = scv.load('C:/Users/Park_Lab/Documents/ACT_sub2/uns/umap.csv')
ValueError: Lengths must match to compare` 

Thanks! Best, YJ

WeilerP commented 2 years ago

As you recommended, I run my raw data (Scanpy unproceeded) by the standard scVelo pipeline. The 2000 n_top_genes can be rightly extracted. It looks like the BUG is caused by Scanpy proceeding the data. How do we fix it?

Sorry, ATM, I'm not sure what's causing this problem. I'll try to replicate it using the Pancreas dataset.

Moreover, we met the KeyError: 'X_umap'. Because we use the raw data, there's no information of 'X_umap', how could we draw the velocity_embedding_stream?

You first need to calculate the UMAP embedding as you did in your Scanpy workflow.

About the preprocessing, thanks for the information. We understand that scv.pp.moments(adata, n_pcs=30, n_neighbors=30) will redo the PCA and neighboring, generating results different from Scanpy's sc.tl.pca(adata, svd_solver='arpack') and sc.pp.neighbors(adata, n_pcs=50, knn=True).

The PCA embedding and neighbor graph are not necessarily recalculated. You can always run

sc.tl.pca(adata, svd_solver='arpack')
sc.pp.neighbors(adata, n_pcs=50, knn=True)
scv.pp.moments(adata, n_pcs=None, n_neighbors=None)

to use Scanpy and its results.

We did below coding to copy our existing UMAP (generated by Scanpy) to the scVelo proceeding data, trying to draw the velocity stream right on the same UMAP of Scanpy, but it generates errors 'ValueError: Lengths must match to compare` .

There's a mismatch in dimensions. Based on your code snippet, adata is an AnnData object with n_obs × n_vars = 13499 × 32285 but the UMAP embedding is only calculated for 2636 observations.

hyjforesight commented 2 years ago

Hello @WeilerP, Sorry for my late response and thanks for your detailed explanation!

For using Scanpy and its results for scVelo, based on your explanation, my understanding is that I can proceed my data with Scanpy to generate the pca, neighbors, clusters and umap. Then run a new scVelo pipeline, loading the raw loom file as a new AnnData object and copy these parameters ('X_pca',.uns['neighbors'],.obsp['distances'],.obsp['connectivities'], 'leiden', 'X_umap') to this adata. So that the .X will still be raw and should be normalized by scv.pp.filter_and_normalize(). Also, because the pca and neighbors parameters have been copied to this adata, scv.pp.moments() will only computes the moments, doing no touch to the pca and neighbors. Am I right?

Based on this understanding, I tried again. My aim is to do scVelo on ACT_sub2 by using the same embedding of Scanpy. Because my ACT_sub2 is a subset of the raw data and has been proceed by Scanpy, I need to slice the raw data first to match the dimensions of .obs and .vars for coping 'X_UMAP' and get the raw .X. Here is the coding.

adata = adata[:, adata.var.highly_variable] generates errors.

Could you please also help me to check whether adata.obsm['X_umap'] = scv.load('C:/Users/Park_Lab/Documents/ACT_sub2/uns/umap.csv').values is right for copying 'X_umap' to adata? I learned this coding from https://github.com/theislab/scanpy/issues/1718, https://github.com/theislab/scvelo/discussions/722, https://github.com/theislab/scvelo/discussions/648, https://github.com/theislab/scvelo/issues/12 and https://github.com/theislab/scvelo/issues/168.

Thanks and happy holiday! Best, YJ

ACT_sub2 = sc.read('C:/Users/Park_Lab/Documents/ACT_sub2.h5ad')    # the subset after Scanpy proceeding.
ACT_sub2
AnnData object with n_obs × n_vars = 2636 × 5000
    obs: 'leiden', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_rpl', 'pct_counts_rpl', 'total_counts_rps', 'pct_counts_rps'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'n_cells', 'mt', 'rpl', 'rps', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
    uns: 'hvg', 'leiden', 'leiden_colors', 'neighbors', 'pca', 'rank_genes_groups', 'umap'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced'
    obsp: 'connectivities', 'distances'

adata = sc.read_loom(filename='C:/Users/Park_Lab/Documents/raw_adata.loom')    # the raw data
adata.var_names_make_unique()
adata
AnnData object with n_obs × n_vars = 13499 × 32285
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    layers: 'matrix', 'ambiguous', 'spliced', 'unspliced'

adata.obs['leiden']=ACT_sub2.obs['leiden']
adata=adata[adata.obs['leiden'].isin(['0', '1', '2', '3'])]    # slice into the wanted obs
adata
View of AnnData object with n_obs × n_vars = 2636 × 32285
    obs: 'leiden'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    layers: 'matrix', 'ambiguous', 'spliced', 'unspliced'

adata.var['highly_variable']=ACT_sub2.var['highly_variable']
adata = adata[:, adata.var.highly_variable]
# I used HVG to calculate the UMAP, so I slice HVG to match the dimensions, but get errors
adata
KeyError                                  Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_12812/4098982234.py in <module>
----> 1 adata = adata[:, adata.var.highly_variable]
      2 adata

~\anaconda3\envs\Python3812\lib\site-packages\anndata\_core\anndata.py in __getitem__(self, index)
   1114     def __getitem__(self, index: Index) -> "AnnData":
   1115         """Returns a sliced view of the object."""
-> 1116         oidx, vidx = self._normalize_indices(index)
   1117         return AnnData(self, oidx=oidx, vidx=vidx, asview=True)
   1118 

~\anaconda3\envs\Python3812\lib\site-packages\anndata\_core\anndata.py in _normalize_indices(self, index)
   1095 
   1096     def _normalize_indices(self, index: Optional[Index]) -> Tuple[slice, slice]:
-> 1097         return _normalize_indices(index, self.obs_names, self.var_names)
   1098 
   1099     # TODO: this is not quite complete...

~\anaconda3\envs\Python3812\lib\site-packages\anndata\_core\index.py in _normalize_indices(index, names0, names1)
     34     ax0, ax1 = unpack_index(index)
     35     ax0 = _normalize_index(ax0, names0)
---> 36     ax1 = _normalize_index(ax1, names1)
     37     return ax0, ax1
     38 

~\anaconda3\envs\Python3812\lib\site-packages\anndata\_core\index.py in _normalize_index(indexer, index)
     99             if np.any(positions < 0):
    100                 not_found = indexer[positions < 0]
--> 101                 raise KeyError(
    102                     f"Values {list(not_found)}, from {list(indexer)}, "
    103                     "are not valid obs/ var names or indices."
KeyError: 'Values [nan, nan, nan, nan, nan, nan, True,... nan, nan, True], are not valid obs/ var names or indices.'

adata.obsm['X_umap'] = scv.load('C:/Users/Park_Lab/Documents/ACT_sub2/uns/umap.csv').values    # Is this coding for copying 'X_umap' to adata right?
WeilerP commented 2 years ago

@hyjforesight, I propose/suggest to run (with adapted arguments)

adata = sc.read_loom(filename='C:/Users/Park_Lab/Documents/raw_adata.loom')    # the raw data
adata.var_names_make_unique()

scv.pp.filter_and_normalizer(adata, min_shared_counts=20, n_top_genes=2000)
sc.tl.pca(adata, svd_solver='arpack')
sc.pp.neighbors(adata, n_pcs=50, knn=True)
scv.pp.moments(adata, n_pcs=None, n_neighbors=None)

sc.tl.umap(adata)
sc.tl.leiden(adata)

My aim is to do scVelo on ACT_sub2 by using the same embedding of Scanpy. Because my ACT_sub2 is a subset of the raw data and has been proceed by Scanpy, I need to slice the raw data first to match the dimensions of .obs and .vars for coping 'X_UMAP' and get the raw .X. Here is the coding. Here is the coding. adata = adata[:, adata.var.highly_variable] generates errors.

You cannot just copy and past data from one object to another. You need to ensure that the observations and variables in both objects are the same. In ACT_sub2 has shape n_obs × n_vars = 2636 × 5000 while in adata has n_obs × n_vars = 13499 × 32285. Off the top of my head, I do not know how the values are assigned if dimensions do not match. Could be that first 2636 are the actual values and all other are nan or that observations existing in both are paired. The same goes for assigning the Leiden clusters (you fix it by subsetting to the right labels) and the UMAP embedding.

hyjforesight commented 2 years ago

cell number is matched by adata=adata[adata.obs['leiden'].isin(['0', '1', '2', '3'])]. to match the gene numbers, check https://github.com/theislab/scanpy/issues/2095. Then we can copy UMAP to the raw subseted data and do scVelo on it.

adata.uns['leiden_colors']=ACI_sec.uns['leiden_colors']
adata.obsm['X_umap']=ACI_sec.obsm['X_umap']
adata