konsumlamm / rrb-vector

An implementation of a Relaxed Radix Balanced Vector in Haskell.
BSD 3-Clause "New" or "Revised" License
20 stars 5 forks source link

(Semi-)Sparse Vector? #21

Open DaveBarton opened 2 months ago

DaveBarton commented 2 months ago

Your folks' great work has inspired me to think about a sparse vector/array type, that is usually (much) more space efficient than IntMap. I'm thinking of a Word64 of bits for present/missing, together with a SmallArray of the non-missing elements (for dimension <= 64) or sparse subvectors. Hopefully with bit-counting instructions for countLeadingZeros or countTrailingZeros, indexing would be pretty fast.

My application is sparse linear algebra, where vectors vary gradually in sparseness. For example, one very large case from integer factoring I believe starts with an integer (or integer mod p) matrix with around 100,000 rows, each of which averages 20-30 nonzero entries. As one does (clever) Gaussian elimination, the rows steadily get more dense. Thus I'm thinking that for most of the algorithm, this data structure would be much better than either extreme of IntMap (good for very sparse vectors), or fully dense arrays.

For linear algebra at least, the basic operations are singleton, zipWith/merge (e.g. adding sparse vectors), map, and fold. For high-performance (exact) work, it'd be nice eventually to also have these types over unboxed integers mod p where p < 2^8, 2^16, 2^32, or 2^64 respectively.

What do you think? Thanks in advance for any advice or interest, and sorry if this is somewhat off-topic for rrb-vector.

konsumlamm commented 1 month ago

I think sparse vectors (and especially linear algebra) are out of scope for this library. If I understand you correctly, you're suggesting a completely new type, not related to RRB-Vectors?

If you want to implement a sparse vector yourself, I recommend looking at bitvec for dimensions > 64. For performant integers mod p, take a look at mod. Another common sparse vector representation seems to be to to store the non-zero indices in an array (instead of a bitmask).