grimbough / rhdf5

Package providing an interface between HDF5 and R
http://bioconductor.org/packages/rhdf5
59 stars 22 forks source link

Memory issue causing R termination #116

Open gaolong opened 1 year ago

gaolong commented 1 year ago

Hello,

I created a h5 file and use a loop to keep writing a small matrix (different sizes) into this h5. But my process always gets killed when the h5 reached certain size around 400M. I wonder if I need to do some gc or close some data variable. Thanks!

############################################################### h5createFile(h5.file) h5createGroup(h5.file,"C") h5createGroup(h5.file,"G")

for(i in 1:1000000)){ ... h5write(mat.c, h5.file, paste0("C/", wid))# mat.c with dim around 40 x 100 ... } h5closeAll() ############################################################### h5ls("Test_CTCF_mat12.h5")

... 28959 /G chr1_997914_997951 H5I_DATASET COMPOUND 23 28960 /G chr1_997953_997970 H5I_DATASET COMPOUND 23 28961 /G chr1_998011_998033 H5I_DATASET COMPOUND 24 28962 /G chr1_998329_998346 H5I_DATASET COMPOUND 19 28963 /G chr1_998371_998390 H5I_DATASET COMPOUND 16 28964 /G chr1_998416_998438 H5I_DATASET COMPOUND 14 28965 /G chr1_998431_998468 H5I_DATASET COMPOUND 14 28966 /G chr1_998511_998528 H5I_DATASET COMPOUND 15 28967 /G chr1_998580_998616 H5I_DATASET COMPOUND 13 28968 /G chr1_998621_998642 H5I_DATASET COMPOUND 13 28969 /G chr1_998632_998668 H5I_DATASET COMPOUND 13 28970 /G chr1_998649_998669 H5I_DATASET COMPOUND 9 28971 /G chr1_999283_999320 H5I_DATASET COMPOUND 6 28972 /G chr1_999524_999541 H5I_DATASET COMPOUND 6 28973 /G chr1_999580_999597 H5I_DATASET COMPOUND 4 28974 /G chr1_999612_999631 H5I_DATASET COMPOUND 5 28975 /G chr1_999855_999891 H5I_DATASET COMPOUND 10 28976 /G chr1_999980_1000017 H5I_DATASET COMPOUND 14

github-actions[bot] commented 1 year ago

Thank you for opening this issue.

Please note that the repository maintainer (@grimbough) is currently on parental leave until October 2022 and any response will take longer than usual.

grimbough commented 1 year ago

Thanks for the question and interest in rhdf5.

Is it the R session that gets shut down? There shouldn't be anything left open if you're using h5write(), which is a fairly high-level function. That should take care of opening and closing the file etc on each iteration of your loop. It's probably quite slow, but it should be safe.

For me I'm able to running the following code based on your example, which produces an H5 file ~1.1GB in size, and I don't see any increase in R's memory usage as it runs:

library(rhdf5)

h5.file <- tempfile()
h5createFile(h5.file)
h5createGroup(h5.file,"C")

mat.c <- matrix(runif(n = 4000), nrow = 40)

for(i in 1:50000){
  h5write(mat.c, h5.file, paste0("C/", i))
}

Does that example work for you? If not, perhaps you can share your complete code and we can see if there's any other issues.