iSEE / iSEEindex

iSEE extension for a landing page to a custom collection of data sets
https://iSEE.github.io/iSEEindex/
1 stars 3 forks source link

Future feature idea: support other serialization methods #30

Open tomsing1 opened 1 year ago

tomsing1 commented 1 year ago

Right now, SummarizedExperiments are loaded from RDS files with the readRDS function. https://github.com/iSEE/iSEEindex/blob/6ee6b8b7e98c0aa2704602893c2073c3f9fea455/R/utils-datasets.R#L43

It might be worth considering supporting other file formats, e.g. .qs files generated with the qs. That might speed up loading large objects. Maybe a simple switch based on file extensions (.rds vs .qs) would be sufficient?

tomsing1 commented 1 year ago

P.S.: Just to spin this thought a little further: When objects get very large, and the assayData is stored in HDF5 files, then we could support reading data with the HDF5Array::loadHDF5SummarizedExperiment() function. Just another example where the hardcoded readRDS() call might not be sufficient.

federicomarini commented 1 week ago

Hey @tomsing1 , probably this is not the only way one could address your idea.

But - have a look at https://github.com/iSEE/iSEEindex/pull/62, which realizes a first implementation to not just have a path to a file (and then readSCE that, no matter what), but also enables pretty much any call done via R code that would already give you an S(C)E object.

That can also cover your case where you would load an sce object from hdf5 file formats or similar, or can be used to serve full data packages, where each individual dataset can be explored separately.

A possible rework of the whole could be with an additional field in the yaml configuration file. Think of a "type"-like of the resource, and the dispatch of what happens to "load that sce" is done based on that value.

Happy to think more out loud with you if needed, feel free to give the fresh devel branch a spin!

Federico