kaizhang / SnapATAC2

Single-cell epigenomics analysis tools
https://kzhang.org/SnapATAC2/
196 stars 20 forks source link

scanpy AnnData to snapatac2 AnnData: cannot read csr_matrix #276

Open beyondpie opened 3 months ago

beyondpie commented 3 months ago

Hi Kai,

I have an anndata.AnnData, I can generate the snapatac2.AnnData as followed

# ann is an anndata.AnnData
sa2ann = snapatac2.AnnData(filename = "tmp.h5ad", X = ann.X)

When I perform subset function, it has Error: "RuntimeError: cannot read csr matrix: Minor indices are not monotonically increasing within each lane."

I think this might be a bug when transforming anndata.AnnData to snapatac2.AnnData.

Thanks! Songpeng

kaizhang commented 3 months ago

SnapATAC2 requires csr matrix to be in the canonical form in order to get predictable performance. Please apply this function to ann.X before passing it to snapatac2.AnnData: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.sort_indices.html#scipy.sparse.csr_matrix.sort_indices

beyondpie commented 3 months ago

@kaizhang Thanks!

  1. Do you think we can add this to your document? For example, here: https://kzhang.org/epigenomics-analysis/anndata.html.
  2. Is this related with the implementation in Rust. I don't know why scanpy can ignore this?
  3. Another inconsistent between AnnData after loading using SnapATAC2 to the memory and scanpy's AnnData is that the attribute of obs is under different type: the former one is kind of Polar DataFrame, while the latter one is pandas.DataFrame.
    • This also make transition from Scanpy's AnnData to SnapATAC2's AnnData difficult.
    • I find Polar Dataframe is not that easy to use, and sometimes I cannot insert a column rightly due to some data type issue. Do you have any idea on this?

Thanks! Songpeng