Closed lazappi closed 3 years ago
@LTLA Is that the solution you had in mind?
Yes, that's right. We'll skip them in SCE2AnnData
and, in the specific case of writeH5AD
, handle the writing on the R side.
The key question is whether it is better to write a dummy object to file (mandatory if the skipped matrix is the X
entry) and fill that in on the R side, or if we should simply skip it and write to file directly (possible if the skipped matrix is one of the layers
).
I wonder how quickly the writes can be done if we just threw an all-zero dummy matrix?
Or the third door; we rewrite readH5AD
and writeH5AD
to be independent of basilisk, avoiding the Python interconversion altogether. Though, this is an investment and presumes some trust in the stability of the format.
Right. I think the play is as follows:
writeH5AD
, loop over all assays and replace any DAs with a completely empty sparse matrix.writeH5AD
, delete the CSC representation and manually write the original DA via block processing.is_sparse()
is TRUE, we write a CSC representation, otherwise we just do the usual HDF5 Dataset business.I can do all of this if someone else takes care of the tests...
👍🏻 Would probably take me a while to work out how to do it but that makes sense. I'm planning to spend some time on zellkonverter later in the week so I can take a look at tests or whatever else still needs doing then.
Some DelayedArray matrices cannot be automatically converted to
dgCMatrix
and fail conversion to standard matrices due to memory limitations. See https://support.bioconductor.org/p/p134226/#p134234 for original issue.We should detect these automatically and either skip them (with a warning) or manage the conversion (if possible).