Skip DelayedArray matrices in SCE2AnnData()

theislab / zellkonverter

Conversion between scRNA-seq objects

https://theislab.github.io/zellkonverter/

Other

145 stars 27 forks source link

Skip DelayedArray matrices in SCE2AnnData() #32

Closed lazappi closed 3 years ago

lazappi commented 3 years ago

Some DelayedArray matrices cannot be automatically converted to dgCMatrix and fail conversion to standard matrices due to memory limitations. See https://support.bioconductor.org/p/p134226/#p134234 for original issue.

We should detect these automatically and either skip them (with a warning) or manage the conversion (if possible).

lazappi commented 3 years ago

@LTLA Is that the solution you had in mind?

LTLA commented 3 years ago

Yes, that's right. We'll skip them in SCE2AnnData and, in the specific case of writeH5AD, handle the writing on the R side.

The key question is whether it is better to write a dummy object to file (mandatory if the skipped matrix is the X entry) and fill that in on the R side, or if we should simply skip it and write to file directly (possible if the skipped matrix is one of the layers).

I wonder how quickly the writes can be done if we just threw an all-zero dummy matrix?

Or the third door; we rewrite readH5AD and writeH5AD to be independent of basilisk, avoiding the Python interconversion altogether. Though, this is an investment and presumes some trust in the stability of the format.

LTLA commented 3 years ago

Right. I think the play is as follows:

In the front part of writeH5AD, loop over all assays and replace any DAs with a completely empty sparse matrix.
Pass this to the HDF5 writer, which will create a very low-cost CSC representation in the H5AD file.
In the back end of writeH5AD, delete the CSC representation and manually write the original DA via block processing.
If is_sparse() is TRUE, we write a CSC representation, otherwise we just do the usual HDF5 Dataset business.

I can do all of this if someone else takes care of the tests...

lazappi commented 3 years ago

👍🏻 Would probably take me a while to work out how to do it but that makes sense. I'm planning to spend some time on zellkonverter later in the week so I can take a look at tests or whatever else still needs doing then.