kaizhang / SnapATAC2

Single-cell epigenomics analysis tools
https://kzhang.org/SnapATAC2/
197 stars 20 forks source link

lots of integers in the python REPL during running spectral embedding #291

Closed beyondpie closed 3 months ago

beyondpie commented 3 months ago

Hi Kai,

Today I used SnapATAC2 2.6 for spectral embedding, when I use the commands

a = snap.tl.spectral(brain_h3k27ac, features = 'selected', inplace = True)
# or 
snap.tl.spectral(brain_h3k27ac, features = 'selected', inplace = True)

in an interactive way, I noticed that lots of integers separated by comma are printed in the python REPL during running. Do you know what are they? Have you noticed this before? Is that possible to ignore the temporary log?

Thanks! Songpeng

beyondpie commented 3 months ago

This might because the scanpy AnnData format used directly for spectral embedding. I will check this later.

beyondpie commented 3 months ago

The full story is this:

  1. We have a scanpy AnnData h5ad file. We use SnapATAC2 to read it and backed as 'r+'.
  2. It works at first for feature selection (inplace = False).
  3. Then we perform spectral embedding (inplace = True), the error is when saving spectral embedding results, the dimension does not match, i.e., number of cells does not match number of features.

I think the error is the how to data is saved (order of rows or columns) in h5ad between scanpy AnnData and SnapATAC2 AnnData.

Do you have any comments?

Thanks! Songpeng

kaizhang commented 3 months ago

Do you have a test data for me?

beyondpie commented 3 months ago

Later we figured it out. The original matrix is saved as CSC format, and after make it to CSR, it works.

kaizhang commented 3 months ago

I see. If you are using the nightly version, the error message should be much better.