kaizhang / SnapATAC2

Single-cell epigenomics analysis tools
https://kzhang.org/SnapATAC2/
223 stars 24 forks source link

Cannot compute joint embedding on RNA and ATAC object #350

Open thecatsmilk opened 1 week ago

thecatsmilk commented 1 week ago

I am following this tutorial to jointly embed my single-nucelus multiome capture from 10X. I have already filtered both RNA and ATAC objects to the same cells, and made sure that the barcodes are identical, then I process with the embedding:

embedding = snap.tl.multi_spectral(
    adatas = [rna, atac],
    features = 'selected'
)[1]

Once running the above cell I am met with this error:

2024-10-09 15:44:29 - INFO - Compute normalized views...
thread '<unnamed>' panicked at src/embedding.rs:411:13:
called `Result::unwrap()` on an `Err` value: Cannot convert Array(F32) to f64 CsrMatrix

---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
Cell In[35], line 1
----> 1 embedding = snap.tl.multi_spectral(
      2     adatas = [rna, atac],
      3     features = 'selected'
      4 )[1]

File /home/.conda/envs/scSnapATAC2_v2024.08/lib/python3.12/site-packages/snapatac2/tools/_embedding.py:533, in multi_spectral(adatas, n_comps, features, weights, random_state, weighted_by_sd)
    530 if weights is None:
    531     weights = [1.0 for _ in adatas]
--> 533 evals, evecs = internal.multi_spectral_embedding(adatas, features, weights, n_comps, random_state)
    535 if weighted_by_sd:
    536     idx = [i for i in range(evals.shape[0]) if evals[i] > 0]

PanicException: called `Result::unwrap()` on an `Err` value: Cannot convert Array(F32) to f64 CsrMatrix

It seems like the issue is that atac.X is a PyArray Element. Keep in mind atac is an AnnDataSet object that I processed with SnapATAC2 and then converted to an AnnData object in order to co-embed these ATAC data with the RNA data from the same capture.

I tried many ways to convert the PyArray to a Numpy array in order to have it be compatible, but I was unsuccessful. It seems as if I could do this if I went into Rust to modulate this PyArray, but I have no experience in Rust. Any help with this would be greatly appreciate it. Thank you!

scanpy==1.10.3
snapatac2==2.7.0
polars==1.6.0
numpy==1.26.4
pandas==2.1.1
h5py==3.11.0
anndata==0.10.8
matplotlib==3.9.2
seaborn==0.13.2
scipy==1.14.1
kaizhang commented 6 days ago

You need to convert numpy array to scipy sparse matrix. You can do that by adata.X = csr_matrix(adata.X[:]).

thecatsmilk commented 6 days ago

@kaizhang

Thanks for the reply! I did discover that you can extract the matrix how you show above (admittedly by accident). I just tried this:

# convert to scipy sparse matrix
atac.X = sp.csr_matrix(atac.X[:])

# perform joint embedding
embedding = snap.tl.multi_spectral(
    adatas=[rna, atac],
    features=None
)[1]

Unfortunately I am still met with the same error:

2024-10-10 15:36:24 - INFO - Compute normalized views...
thread '<unnamed>' panicked at src/embedding.rs:411:13:
called `Result::unwrap()` on an `Err` value: Cannot convert Array(F64) to f64 CsrMatrix

---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
Cell In[32], line 1
----> 1 embedding = snap.tl.multi_spectral(
      2     adatas=[rna, atac],
      3     features=None
      4 )[1]

File /home/.conda/envs/scSnapATAC2_v2024.08/lib/python3.12/site-packages/snapatac2/tools/_embedding.py:533, in multi_spectral(adatas, n_comps, features, weights, random_state, weighted_by_sd)
    530 if weights is None:
    531     weights = [1.0 for _ in adatas]
--> 533 evals, evecs = internal.multi_spectral_embedding(adatas, features, weights, n_comps, random_state)
    535 if weighted_by_sd:
    536     idx = [i for i in range(evals.shape[0]) if evals[i] > 0]

PanicException: called `Result::unwrap()` on an `Err` value: Cannot convert Array(F64) to f64 CsrMatrix

What I did next was basically build a new AnnData object after extracting all the elements, like so:

# rebuild AnnData
obs_df = pd.DataFrame({
    'sample': atac.obs['sample'],
    'leiden_0.05': atac.obs['leiden_0.05'],
    'leiden_0.1': atac.obs['leiden_0.1'],
    'leiden_0.2': atac.obs['leiden_0.2'],
    'leiden_0.35': atac.obs['leiden_0.35'],
    'leiden_0.4': atac.obs['leiden_0.4'],
    'leiden_0.6': atac.obs['leiden_0.6'],
    'leiden_0.8': atac.obs['leiden_0.8'],
    'leiden_1.0': atac.obs['leiden_1.0'],
    'leiden_1.2': atac.obs['leiden_1.2'],
    'leiden_1.4': atac.obs['leiden_1.4'],
    'ATAC': atac.obs['ATAC'],
    'cellType': atac.obs['cellType'],
    'barcodes_clean': atac.obs['barcodes_clean'],
    'barcodes_orig': atac.obs['barcodes_orig'],
    'RNA': atac.obs['RNA']
})
obs_df.index = atac.obs_names

var_df = pd.DataFrame({
    'count': atac.var['count'],
    'selected': atac.var['selected']
})
var_df.index = atac.var_names

# build AnnData
new_atac = ad.AnnData(X=X_f64, obs=obs_df, var=var_df)
new_atac

Where X_f64 was created like this:

from scipy import sparse as sp
X = atac.X[:]
X = sp.csr_matrix(X)
X_f64 = X.astype('float64')

With new_atac I tried to co-embedding again:

embedding = snap.tl.multi_spectral(
    adatas = [rna, new_atac],
    features = rna.var['selected']
)[1]

It worked! I move to re-running UMAP:

new_atac.obsm['X_joint'] = embedding
snap.tl.umap(new_atac, use_rep='X_joint')

Unfortunately I am met with this error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[136], line 2
      1 new_atac.obsm['X_joint'] = embedding
----> 2 snap.tl.umap(new_atac, use_rep='X_joint')

File /home/.conda/envs/scSnapATAC2_v2024.08/lib/python3.12/site-packages/snapatac2/tools/_embedding.py:113, in umap(adata, n_comps, use_dims, use_rep, key_added, random_state, inplace, **kwargs)
    108 if use_dims is not None:
    109     data = data[:, :use_dims] if isinstance(use_dims, int) else data[:, use_dims]
    111 umap = UMAP(random_state=random_state,
    112             n_components=n_comps,
--> 113             **kwargs).fit_transform(data)
    114 if inplace:
    115     adata.obsm["X_" + key_added] = umap

File /home/.conda/envs/scSnapATAC2_v2024.08/lib/python3.12/site-packages/umap/umap_.py:2891, in UMAP.fit_transform(self, X, y, force_all_finite)
   2855 def fit_transform(self, X, y=None, force_all_finite=True):
   2856     """Fit X into an embedded space and return that transformed
   2857     output.
   2858 
   (...)
   2889         Local radii of data points in the embedding (log-transformed).
   2890     """
-> 2891     self.fit(X, y, force_all_finite)
   2892     if self.transform_mode == "embedding":
   2893         if self.output_dens:

File /home/.conda/envs/scSnapATAC2_v2024.08/lib/python3.12/site-packages/umap/umap_.py:2358, in UMAP.fit(self, X, y, force_all_finite)
   2356     X = check_array(X, dtype=np.uint8, order="C", force_all_finite=force_all_finite)
   2357 else:
-> 2358     X = check_array(X, dtype=np.float32, accept_sparse="csr", order="C", force_all_finite=force_all_finite)
   2359 self._raw_data = X
   2361 # Handle all the optional arguments, setting default

File /home/.conda/envs/scSnapATAC2_v2024.08/lib/python3.12/site-packages/sklearn/utils/validation.py:1096, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_writeable, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
   1094     n_features = array.shape[1]
   1095     if n_features < ensure_min_features:
-> 1096         raise ValueError(
   1097             "Found array with %d feature(s) (shape=%s) while"
   1098             " a minimum of %d is required%s."
   1099             % (n_features, array.shape, ensure_min_features, context)
   1100         )
   1102 if force_writeable:
   1103     # By default, array.copy() creates a C-ordered copy. We set order=K to
   1104     # preserve the order of the array.
   1105     copy_params = {"order": "K"} if not sp.issparse(array) else {}

ValueError: Found array with 0 feature(s) (shape=(6333, 0)) while a minimum of 1 is required.

Then—under closer inspection—I realized embedding has no features:

embedding
array([], shape=(6333, 0), dtype=float64)

Any ideas on how to proceed from here?

kaizhang commented 1 day ago

I think it is the 'rna' matrix that causes the problem.