mannau / h5

Interface to the HDF5 Library
Other
70 stars 22 forks source link

Once a file has had data read from it, it cannot be deleted in the R session #46

Open ghost opened 7 years ago

ghost commented 7 years ago

I think there is something wrong with the h5close command. If you read some data from within a h5 file and close the file with h5close (returning TRUE), the file becomes locked by the r session and cannot be deleted. Restarting R allows the file to be deleted. I have included a reproducable example.

library(h5)

# create an example hdf5 file
file <- h5file("test.hdf5", 'w')

# create some example data.
testvec <- rnorm(10)
testmat <- matrix(1:9, nrow = 3)
row.names(testmat) <- 1:3
colnames(testmat) <- c("A", "BE", "BU")
letters1 <- paste(LETTERS[runif(45, min = 1, max = length(LETTERS))])
letters2 <- paste(LETTERS[runif(45, min = 1, max = length(LETTERS))])
testarray <- array(paste0(letters1, letters2), c(3, 3, 5))

# write the data to hdf5 file
file["test/testvec"] <- testvec
file["test/testmat"] <- testmat
file["test/testarray"] <- testarray

# close the file reference
closed <- h5close(file)
cat(sprintf('Closed the example file = %i\n', closed))

openedFromDisk <- h5file("test.hdf5")
thedata <- readDataSet(openedFromDisk["test/testarray"])
closed <- h5close(openedFromDisk)
cat(sprintf('Closed the loaded file = %i\n', closed))

#attempt to delete the file
removed <- file.remove("test.hdf5")
cat(sprintf('Deleted the loaded file = %i\n', removed))
mannau commented 7 years ago

Hi, I guess you are using windows? Could you try the following following code instead of the last part of your snippet:

...
openedFromDisk <- h5file("test.hdf5")
dataset <- openedFromDisk["test/testarray"]
thedata <- readDataSet(openedFromDisk)
h5close(openedFromDisk)
closed <- h5close(openedFromDisk)
removed <- file.remove("test.hdf5")

The reason is that openedFromDisk["test/testarray"] implicitly creates a dataset which is not closed. Unfortunately h5close(openedFromDisk) does not recursively close all openened objects in a file. Does it solve your problem? Best, m

ghost commented 7 years ago

Yes I'm using windows, I should have included that! Windows 7 64 bit. R 3.3.3.

Here is the output from running your snippet.

openedFromDisk <- h5file("test.hdf5")
> dataset <- openedFromDisk["test/testarray"]
> thedata <- readDataSet(openedFromDisk)
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘readDataSet’ for signature ‘"H5File"’
> h5close(openedFromDisk)
> closed <- h5close(openedFromDisk)
Warning message:
In eval(substitute(expr), envir, enclos) : H5Fflush failed
> removed <- file.remove("test.hdf5")
Warning message:
In file.remove("test.hdf5") :
  cannot remove file 'test.hdf5', reason 'Permission denied'
mannau commented 7 years ago

sorry, I meant

...
> dataset <- openedFromDisk["test/testarray"]
> thedata <- readDataSet(dataset)
> h5close(dataset)

Best, m

2017-03-15 20:08 GMT+01:00 gillepy notifications@github.com:

Yes I'm using windows, I should have included that! Windows 7 64 bit. R 3.3.3.

Here is the output from running your snippet.

openedFromDisk <- h5file("test.hdf5")

dataset <- openedFromDisk["test/testarray"] thedata <- readDataSet(openedFromDisk) Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘readDataSet’ for signature ‘"H5File"’ h5close(openedFromDisk) closed <- h5close(openedFromDisk) Warning message: In eval(substitute(expr), envir, enclos) : H5Fflush failed removed <- file.remove("test.hdf5") Warning message: In file.remove("test.hdf5") : cannot remove file 'test.hdf5', reason 'Permission denied'

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mannau/h5/issues/46#issuecomment-286847936, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNFaPqfxaXOKSYWnymcmUsyXLGyDh0Hks5rmDcqgaJpZM4MeKkx .

ghost commented 7 years ago

Running the following successfully deletes the file and does not show any warnings.

library(h5)

# create an example hdf5 file
file <- h5file("test.hdf5", 'w')

# create some example data.
testvec <- rnorm(10)
testmat <- matrix(1:9, nrow = 3)
row.names(testmat) <- 1:3
colnames(testmat) <- c("A", "BE", "BU")
letters1 <- paste(LETTERS[runif(45, min = 1, max = length(LETTERS))])
letters2 <- paste(LETTERS[runif(45, min = 1, max = length(LETTERS))])
testarray <- array(paste0(letters1, letters2), c(3, 3, 5))

# write the data to hdf5 file
file["test/testvec"] <- testvec
file["test/testmat"] <- testmat
file["test/testarray"] <- testarray

# close the file reference
closed <- h5close(file)
cat(sprintf('Closed the example file = %i\n', closed))

openedFromDisk <- h5file("test.hdf5")
dataset <- openedFromDisk["test/testarray"]
thedata <- readDataSet(dataset)
h5close(dataset)
h5close(openedFromDisk)

#attempt to delete the file
removed <- file.remove("test.hdf5")
cat(sprintf('Deleted the loaded file = %i\n', removed))
ghost commented 7 years ago

Out of curiosity I also tried:

thedata <- readDataSet(openedFromDisk["test/testarray"])
h5close(openedFromDisk["test/testarray"])

which does lock the file. Are you able to explain why this doesn't work? I am able to rewrite my code to use the workaround, so it isnt a problem.

mannau commented 7 years ago

the line

thedata <- readDataSet(openedFromDisk["test/testarray"])

opens the dataset without closing it.

jcpetkovich commented 7 years ago

This actually happens to me on linux as well, the deleted files appear in lsof, but have otherwise been removed from disk. Once the R process has been killed the disk space is released. Is there any workaround for this?

jcpetkovich commented 7 years ago

One thing I noticed so far is that although h5close does appear to call close on the file object, it doesn't trigger the deletion of the object, infact, none of the objects allocated using new by this library are ever deleteed. From reading the source for the C++ interface to hdf5 files, I don't think this should prevent the file handles from being released, but that's still what I see happening.

I'm still pretty confused as to what's happening here.

PeterNSteinmetz commented 5 years ago

This is a serious issue when processing large numbers of h5 files in R. Eventually it will overflow the operating system limit on the number of files which can be open at one time.

jeroen commented 5 years ago

With the new windows toolchains, the unit tests are failing now. It looks like exactly this problem: the files cannot be deleted, even after they are closed.

Did anyone ever find the culprit?

Running the tests in 'tests/testthat.R' failed.
Last 13 lines of output:
  == testthat results  ===========================================================
  OK: 134 SKIPPED: 0 WARNINGS: 97 FAILED: 68
  1. Failure: Attribute-Errors (@test-Attribute.R#66) 
  2. Failure: Attribute-H5Type-File (@test-Attribute.R#123) 
  3. Error: Attribute-list-attributes (@test-Attribute.R#129) 
  4. Failure: Bug_AttributeGroupSubset (@test-Attribute.R#170) 
  5. Error: DataSet-Vector-NA (@test-DataSet-IO-NA.R#19) 
  6. Error: DataSet-Vector (@test-DataSet-IO.R#15) 
  7. Failure: DataSet-Vector-boundaries (@test-DataSet-IO.R#120) 
  8. Error: datatypes-Matrix (@test-DataSet-IO.R#137) 
  9. Error: datatypes-Array (@test-DataSet-IO.R#203) 
PeterNSteinmetz commented 5 years ago

IIRC, this was likely due to a problem with management of resources on the hdf library side. Note that this package, h5, is deprecated in favor of https://github.com/hhoeflin/hdf5r . (I missed the deprecation notice as well when first using the h5 package). I moved over to hdf5r and was able to make things work with that.