privefl / bigstatsr

R package for statistical tools with big matrices stored on disk.
https://privefl.github.io/bigstatsr/
179 stars 30 forks source link

Issues with locking of the backing file #178

Closed dramanica closed 4 months ago

dramanica commented 4 months ago

Once an FBM has been instantiated, the locking of the backing file differs among Window and Linux (and its behavior is not ideal in either). In Linux, I can delete the backing file whilst the object still exits:

mat <- matrix(1:4, 2)
X <- bigstatsr::as_FBM(mat)
backfile <- X$backingfile
file.remove(backfile)

This will obviously give an error if I try to access X[]

On Windows, the file is locked, but it remains so even after deleting X. I need to restart the session to break the lock.

The desired behavior would be a lock on the file whilst X is in existence, but the lock should be released when the object is deleted.

privefl commented 4 months ago

On Windows, you just need to call the garbage collector with gc(), between rm and file.remove.

A function big_remove() could probably be implemented to perform these 3 actions.

dramanica commented 4 months ago

Yes, that makes sense. The problem remains on Linux, where I can wreck the FBM by deleting its backing file. Does the Windows lock arise "spontaneously", or are you locking explicitly? (sorry, being lazy, I did not look at the code in detail)

privefl commented 4 months ago

I don't do anything special on my side IIRC.

dramanica commented 4 months ago

Ok, this is trickier than I had hoped. I think the difference is how the mio mapping is handled by the different operating systems. So, we might just have to live with the current weirdness on locking (short of implementing locking within bigstatsr, but that seems to be an overkill, as it should not happen too often that the user deletes the backing file).

privefl commented 4 months ago

So, just leave it be?

dramanica commented 4 months ago

I haven't got a good solution, so I guess yes. Happy to close this.