privefl / bigsparser

Sparse matrix format with data on disk
10 stars 3 forks source link

A few questions about functionality #5

Closed timothy-barry closed 3 years ago

timothy-barry commented 3 years ago

Very nice package. A few questions:

  1. Can I interact with the C/C++ code underlying this package directly using Rcpp?
  2. In CSC format, how are the row indices and data stored? Just as really big vectors?
  3. If so, can I freely index into the row index and data vectors?

Thanks; I might be interested in contributing.

privefl commented 3 years ago

Thanks

  1. you can for https://github.com/privefl/bigsparser/tree/master/inst/include/bigsparser. If you need something else, I can probably move it from src/ to there.
  2. the standard matrix format is to have @i, @x and @p, where i and x are just big vectors, yes. For the format in this package, p is stored in memory (small) and then each (i,x) pair is stored using 16 bytes in the backingfile.
  3. not sure what you mean exactly?
timothy-barry commented 3 years ago
  1. Got it. Could I instead store the data x as an unsigned int?
  2. Sorry, to clarify: When using a CSC matrix, if we want to extract the kth column, we index the tuple (i,x) from position p[k] to p[k+1] - 1. My question is this: is it possible to index into (i,x) at arbitrary position j, where j is not an element of p?
privefl commented 3 years ago
  1. We would need to extend the implementation to handle more types.

  2. by "index", you mean "access"? What do you mean j is not an element of p? that the value is 0 at X[i, j] and therefore x is not stored?

timothy-barry commented 3 years ago
  1. Got it
  2. Yes, I mean access. I'll try to ask differently. We have big vectors x and i. Let's suppose x and i are of length n_total. Can I access x[j] and i[j], where j is an integer in the range [0, n_total)?

I think I might have made this question more complicated than it needs to be.

privefl commented 3 years ago

Can you access the non-zero elements? Of course.

I mean the R accessors are not implement yet, but it should not be too difficult to access these in C++.

timothy-barry commented 3 years ago

Ha, right. And presumably it's just as easy to write nonzero elements. I'll take a closer look at the repo; thanks.