kaneplusplus / bigmemory

126 stars 24 forks source link

timeline for big.sparse.matrix support? #60

Open YaohuiZeng opened 7 years ago

YaohuiZeng commented 7 years ago

Thanks for this great package. I have my package that depends on bigmemory, and now I wish to add the support for sparse matrix to my package. I noticed that you put big.sparse.matrix on your wish list. Just wondering whether you have any plan to implement and any timeline expected for that?

Of course I can use sparse matrix from Eigen or Armadillo libraries. But those don't support memory mapping, which is the key feature I need.

privefl commented 7 years ago

I think the two main active developers are overbooked at the moment.

I think the difficulty of this implementation really depends on what you really need. If you only need a C++ accessor (i, j), I think it shoud be relatively easy to do it via defining another class. If you are interested, we could try it in another package and then merge it later if it works.

YaohuiZeng commented 7 years ago

@privefl I haven't thought about the design though, but basically I need a memory-mapped version of SpMat in Armadillo, for example. If that's what in your mind, I'd be very glad to work together with you to get this implemented, though I am not sure how much work it requires.

cdeterman commented 7 years ago

Hi @YaohuiZeng, as @privefl said Mike and I quite busy. I would love to give more attention to this package but I am quite busy with other work related matters as well as maintaining my other packages. If you are interested in contributing we have begun transitioning towards having development happening on my fork of the package. Although I am not free enough to write much code I am able to monitor merge requests and respond to questions.

I'm not sure of your familiarity with C++ so not sure how deep you want to get in to this. Basically what I had in mind is to ideally have child class inherit from the parent BigMatrix classes but I don't know if this will work that simply. I suspect it won't and it will require another distinct class such as SparsesBigMatrix mirroring the structure in the file here. In either case, we will want to look in to the boost library for sparse support/functionality. That is ultimately where the heart of this package is rooted. If you can find the support within boost and get an idea of how it works we will have a very solid starting point.

privefl commented 7 years ago

I was thinking more about a naive implementation:

Maybe too naive.

cdeterman commented 7 years ago

@privefl That may lead to some performance gains with only customized functions but won't provide any actual compression (i.e. save memory footprint) or an ability to interface nicely with other libraries like Armadillo (unsure how converting from a normal matrix to compressed would work). We can experiment but I personally still think exploring more use of boost is likely the way to go. Of course, the other authors are welcome to add their opinions as well @kaneplusplus @phaverty

YaohuiZeng commented 7 years ago

@privefl, I think your design is more like another implementation of SpMat in Armadillo, i.e., no memory-mapping involved, is there? My best guess is that we may still have to go with boost, just as @cdeterman said.

privefl commented 7 years ago

Okay, I was pointing more to a light implementation, which could only do a restricted number of features. I don't think @YaohuiZeng need all the features available for a standard big.matrix.

What I had in mind was https://github.com/privefl/spBigMatrix. There, you can see the results

jaredhuling commented 7 years ago

@privefl A light implementation like this could really go a long way! Like @YaohuiZeng, I'd be quite interested in this support for bigmemory

privefl commented 3 years ago

I've restarted my project of having an on-disk sparse matrix format, this has now become https://cran.r-project.org/web/packages/bigsparser/index.html

This is still a very light implementation but there are already some useful features that I've started using in my work.