Open kasperdanielhansen opened 4 years ago
I’ll do it (or get someone on my team to)
You're now assigned :)
see https://github.com/trichelab/h5testR for implementations of the above using minfiData and TARGET pAML IDATs for 450K and EPIC arrays respectively. The latter scale up to 500 arrays.
Note that the h5testR examples explicitly load in-core and out-of-core RGsets of whichever IDATs are requested, then save the HDF5 version, overwrite its symbol with a loaded version from the save, and use verifyRGsets to test that a chunk of the values in the per-channel matrices are identical to those in the corresponding in-core representation. Part of the motivation for this is to extend the test to restfulSE representations that live in Amazon AWS-backed HSDS representations.
A link to the appropriate place in the Google Drive for this would be handy. minfi:::read.metharray2() has a quirk where it will fail if the directory for storing the hdf5 files does not exist (this is the OPPOSITE situation as for saveHDF5SummarizedExperiment, where the save will fail if the target directory DOES exist and replace=TRUE is not set).
Hence the wrapper functions read.methd5() and read.methdf5.sheet(). In practice, these should probably be called write.methdf5() and have a counterpart write.methRestful (or some such)
That's a very useful comment and points out a bug I would say in read.metharray2
.
What I would love is a google Drive folder with 50 IDATs and a set of summarized experiments. I get that the posted code makes this objective easier to accomplish, but its not completely there.
Ah, this is much easier — I’ll grab the first 5, 10, 25, 50 IDATs from TARGET and drop them in the google drive along with their in- and out-of-core RGsets then. Easy enough.
As a bonus, all four of those can be linked with “holes” against their RNAseq data to demonstrate the issue with multiassayexperiment objects backed by HDF5.
--t
On May 26, 2020, at 1:31 PM, Kasper Daniel Hansen notifications@github.com wrote:
That's a very useful comment and points out a bug I would say in read.metharray2.
What I would love is a google Drive folder with 50 IDATs and a set of summarized experiments. I get that the posted code makes this objective easier to accomplish, but its not completely there.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or unsubscribe.
We need to test our HDF5 code. An important part is to assess scalability which we will do by running code with different number of samples and look at runtime as a function of samples.
We need a Google Drive directory (with subdirectories) containing
Possible file name convention: mData5_cdim10x100 (5 samples, chunkdim 10x100)
Useful functions: saveHDF5SummarizedExperiment (and load) from HDF5Array.