kaneplusplus / bigmemory

126 stars 24 forks source link

Do big.matrices know where they are filebacked? #55

Closed privefl closed 7 years ago

privefl commented 7 years ago

Is there a way to know where a filebacked big.matrix is stored on disk (the directory)? If not, I think it should be easy to add one slot to the big.matrix object with its stored backingpath so that we can directly attach or sub a big.matrix without asking the user to specify the directory (backingpath). Or maybe add it to the description object instead?

Do you want to do it? I think I can do it if you want to. If not, I will have to make an object that extends a big.matrix.

phaverty commented 7 years ago

This package

https://bioconductor.org/packages/release/bioc/html/bigmemoryExtras.html

implements a ReferenceClass that knows the backing path and can re-attach as necessary. For example, if you reload the object from an RData file and then use it, it will attach itself to the on-disk data and then carry on. The package also has a factor type and some optimizations related to the dimnames.

Pete


Peter M. Haverty, Ph.D. Genentech, Inc. phaverty@gene.com

On Thu, Dec 22, 2016 at 12:16 AM, Florian Privé notifications@github.com wrote:

Is there a way to know where a filebacked big.matrix is stored on disk (the directory)? If not, I think it should be easy to add one slot to the big.matrix object with its stored backingpath so that we can directly attach or sub a big.matrix without asking the user to specify the directory (backingpath). Or maybe add it to the description object instead?

Do you want to do it? I think I can do it if you want to. If not, I will have to make an object that extends a big.matrix.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaneplusplus/bigmemory/issues/55, or mute the thread https://github.com/notifications/unsubscribe-auth/AH02K3VS2NZuOIXTP7w6XBCoMr9dFfkIks5rKjHvgaJpZM4LTvaj .

privefl commented 7 years ago

I really like the safety feature.

Yet, I thought more of having an object with two classes (one that extends big.matrix & big.matrix) so that you can (seamlessly) use all functions available for big.matrix objects. In order to let people choose if they want to use the extension or not (or include it directly as part of a big.matrix).

I understand that the big.matrix object is accessed via $bigmat in your BigMatrix. So, if I want to use a sub.big.matrix, I can use sub.big.matrix(X$bigmat, lastCol = 50, backingpath = dirname(X$backingfile))?

In fact, I just want to be able to use sub.big.matrix without having to specify the backingpath parameter. I am a heavy user of sub.big.matrix for the purpose of parallelism on column blocks.

phaverty commented 7 years ago

The class you describe is exactly what I wanted too. The auto-attach feature required the magic of ReferenceClasses and activeBindingFunctions, though. I hoped to make big.matrix and BigMatrix interchangeable by giving them the same API. However, it would still be nice to have the two share a super class so S4 dispatch would do the right thing.

I'd be open to making a simpler BigMatrix-like thing and putting the shared code in bigmemory, but I'd have to think a bit about what the simpler object would be.

Pete


Peter M. Haverty, Ph.D. Genentech, Inc. phaverty@gene.com

On Thu, Dec 22, 2016 at 1:26 PM, Florian Privé notifications@github.com wrote:

I really like the safety feature.

Yet, I thought more of having an object with two classes (one that extends big.matrix & big.matrix) so that you can use (seamlessly) all functions available for big.matrix objects. In order to let people choose if they want to use the extension or not (or include it directly as part of a big.matrix).

I understand that the big.matrix object is accessed via $bigmat in your BigMatrix. So, if I want to use a sub.big.matrix, I can use sub.big.matrix(X$bigmat, lastCol = 50, backingpath = dirname(X$backingfile))?

In fact, I just want to be able to use sub (or attach) without having to specify the backingpath parameter.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaneplusplus/bigmemory/issues/55#issuecomment-268893976, or mute the thread https://github.com/notifications/unsubscribe-auth/AH02K_KdFsq3qMiHsWhU-SgUqZgAq2mhks5rKur1gaJpZM4LTvaj .

privefl commented 7 years ago

I am analysing some of the code to see if I could change this behaviour without affecting users that use the path extra parameter.

I've come accros these lines of code: https://github.com/kaneplusplus/bigmemory/blob/master/R/bigmemory.R#L1836-L1840. As the new address is created with readOnly, I think the condition is always true. I wanted to be sure before removing these 5 lines of code in the new version I will suggest. Edit: Ok, we can change permissions with chmod like in this test: https://github.com/kaneplusplus/bigmemory/blob/master/tests/testthat/test_readonly.R#L63-L66. So the second question: the test should be (is.readonly(ret) && !readOnly)?

privefl commented 7 years ago

Could you review this PR: https://github.com/kaneplusplus/bigmemory/pull/56?