Open YaohuiZeng opened 7 years ago
I think the two main active developers are overbooked at the moment.
I think the difficulty of this implementation really depends on what you really need. If you only need a C++ accessor (i, j), I think it shoud be relatively easy to do it via defining another class. If you are interested, we could try it in another package and then merge it later if it works.
@privefl I haven't thought about the design though, but basically I need a memory-mapped version of SpMat
in Armadillo
, for example. If that's what in your mind, I'd be very glad to work together with you to get this implemented, though I am not sure how much work it requires.
Hi @YaohuiZeng, as @privefl said Mike and I quite busy. I would love to give more attention to this package but I am quite busy with other work related matters as well as maintaining my other packages. If you are interested in contributing we have begun transitioning towards having development happening on my fork of the package. Although I am not free enough to write much code I am able to monitor merge requests and respond to questions.
I'm not sure of your familiarity with C++ so not sure how deep you want to get in to this. Basically what I had in mind is to ideally have child class inherit from the parent BigMatrix
classes but I don't know if this will work that simply. I suspect it won't and it will require another distinct class such as SparsesBigMatrix
mirroring the structure in the file here. In either case, we will want to look in to the boost
library for sparse support/functionality. That is ultimately where the heart of this package is rooted. If you can find the support within boost
and get an idea of how it works we will have a very solid starting point.
I was thinking more about a naive implementation:
i, p, Dim, x
slots (just like a dgCMatrix
) with i
and x
being one-column big.matrix
objects.Maybe too naive.
@privefl That may lead to some performance gains with only customized functions but won't provide any actual compression (i.e. save memory footprint) or an ability to interface nicely with other libraries like Armadillo (unsure how converting from a normal matrix to compressed would work). We can experiment but I personally still think exploring more use of boost
is likely the way to go. Of course, the other authors are welcome to add their opinions as well @kaneplusplus @phaverty
@privefl, I think your design is more like another implementation of SpMat
in Armadillo
, i.e., no memory-mapping involved, is there? My best guess is that we may still have to go with boost
, just as @cdeterman said.
Okay, I was pointing more to a light implementation, which could only do a restricted number of features. I don't think @YaohuiZeng need all the features available for a standard big.matrix
.
What I had in mind was https://github.com/privefl/spBigMatrix. There, you can see the results
@privefl A light implementation like this could really go a long way! Like @YaohuiZeng, I'd be quite interested in this support for bigmemory
I've restarted my project of having an on-disk sparse matrix format, this has now become https://cran.r-project.org/web/packages/bigsparser/index.html
This is still a very light implementation but there are already some useful features that I've started using in my work.
Thanks for this great package. I have my package that depends on
bigmemory
, and now I wish to add the support for sparse matrix to my package. I noticed that you putbig.sparse.matrix
on your wish list. Just wondering whether you have any plan to implement and any timeline expected for that?Of course I can use sparse matrix from
Eigen
orArmadillo
libraries. But those don't support memory mapping, which is the key feature I need.