Closed mikera closed 10 years ago
Over in LensKit, we take a somewhat different design:
The big things that we need are performance-related:
long
as valid keys (including negative values)@elehack thanks for the comments - very interesting. I need to have a think about how much of this we can support in Vectorz. Initial thoughts:
Our vectors have 3 versions with 2 concrete implementations:
SparseVector
— abstract, provides the read-only API. Many components that act on sparse vectors take a SparseVector
. If you have a reference to a SparseVector
, the type system says you can't modify it, but does not prohibit another component from modifying it.MutableSparseVector
— concrete mutable implementation.ImmutableSparseVector
— concrete immutable implementation. If a component has one of these, not only is it prohibited from changing it, but it is guaranteed that no other code can change it.Got it. The immutable / mutable approach is consistent with what we do in Vectorz, we actually have two distinct tests:
MutableSparseVector
would not meet this criteria since the key domain is limited)Also Vectorz is getting some adoption as the main pure-JVM matrix library in Clojure, so immutability is an important priority.
The negative indexes I think can't be supported with the current Vectorz AVector
API - this is for a whole bunch of reasons: performance, the need to guarantee certain type invariants, the fact that Vectorz supports composite / concatenated vector views etc.
This still leaves open the option of a sparse vector implemented as a wrapper over a Vectorz dense vector. That might actually be a good solution: a lot of performance would come from the underlying dense vector, while the wrapper could take care of the mapping and indexing. Could be something like an enhanced variant of the current SparseIndexedVector
:
FWIW, I've now added immutable vectors and matrices to the latest development branch of Vectorz. Will be in the 0.25.0 release
OK we now have a SparseHashedVector
, which takes care of the fully mutable sparse vector case. Should be in 0.27.0
release.
Closing this specific issue as I think the original scope is now covered. Hopefully this solves the mutable sparse case for LensKit as well - if not then should probably be a new issue.
Various applications (machine learning, collaborative filtering, recommender systems) depend on large sparse vectors, which contain relatively few non-zero values relative to their size.
Proposal is to add sparse vectors with the following attributes:
Efficiency is probably best achieved with a shallow tree data structure, though a hashmap based sparse vector could also be an option.