kaizhang / SnapATAC2

Single-cell epigenomics analysis tools
https://kzhang.org/SnapATAC2/
222 stars 24 forks source link

[Multi modality pipeline] Are number of cells and var attributes required to be the same while performing rna atac embedding? #343

Open yojetsharma opened 22 hours ago

yojetsharma commented 22 hours ago

I preprocessed the snRNA of my multiome using scanpy and it has 59000 cells with different var attributes (‘highly variable’), while obs are ‘sample’ and ‘leiden’. The obs attributes are the same in ATAC processed using snapatac2 but var attributes are not. When I run, assert (rna.obs_names == atac.obs_names).all() I get an error saying the “lengths should match”.

kaizhang commented 8 hours ago

Variable length doesn't need to be the same. ATAC and RNA must share exactly the same barcodes, i.e., the data are coming from the same cell. You can run assert (rna.obs_names == atac.obs_names).all() yourself to make sure.

yojetsharma commented 8 hours ago

After running assert (rna.obs_names == atac.obs_names).all() I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[27], line 1
----> 1 assert (rna.obs_names == atac.obs_names).all()

File ~/.conda/envs/scarches/lib/python3.9/site-packages/pandas/core/ops/common.py:72, in _unpack_zerodim_and_defer.<locals>.new_method(self, other)
     68             return NotImplemented
     70 other = item_from_zerodim(other)
---> 72 return method(self, other)

File ~/.conda/envs/scarches/lib/python3.9/site-packages/pandas/core/arraylike.py:42, in OpsMixin.__eq__(self, other)
     40 @unpack_zerodim_and_defer("__eq__")
     41 def __eq__(self, other):
---> 42     return self._cmp_method(other, operator.eq)

File ~/.conda/envs/scarches/lib/python3.9/site-packages/pandas/core/indexes/base.py:6962, in Index._cmp_method(self, other, op)
   6957         return arr
   6959 if isinstance(other, (np.ndarray, Index, ABCSeries, ExtensionArray)) and len(
   6960     self
   6961 ) != len(other):
-> 6962     raise ValueError("Lengths must match to compare")
   6964 if not isinstance(other, ABCMultiIndex):
   6965     other = extract_array(other, extract_numpy=True)

ValueError: Lengths must match to compare

Since, mine is a multiome expt, could this be due to order in which the barcodes are present in both modalities are not the same?