Open GSGerritsen opened 3 years ago
Implementing a memory-mapped SparseVector
would be straightforward but tedious. You can duplicate code, much in the same way as RawVectorMapper
and IntVectorMapper
duplicate the in-memory RawVector
and IntVector
. Alternatively, you can use generics in BitVector
and SparseVector
to have them use either in-memory or memory-mapped backing structures, but the code will probably get quite ugly.
Appending to a SparseVector
is possible – SparseBuilder
already does that for in-memory structures. First you choose divisor d
, which is hardcoded to be a power of 2. The optimal choice depends on knowing the density of values in advance. Then you append the values in increasing order:
v / d
determines the bucket the value goes to.high
encodes the number of values in each bucket in unary, while integer vector low
stores the remainders.v
is the i
th value, you append it by setting bit high[i + v / d]
and appending v % d
to low
.However, due to the way the structures have been implemented, you would have to use two separate files, with a RawVectorWriter
for high
and IntVectorWriter
for low
, and concatenate the files afterwards. Also, queries require select
support structures for high
, and those structures are immutable. Mutable alternatives do exist, but their time/space trade-offs are worse. Finally, the current memory mapping implementation assumes read-only files, and even my plans for mutable files assume that the size of the file does not change.
The overall design of simple-sds reflects how I have been using SDSL data structures in bioinformatics applications. In-memory data structures are assumed to be gigabytes or tens of gigabytes in size and often shared by tens of threads. The writer structures are intended for writing temporary files, while memory-mapped structures are used for reading them.
Hello,
I'm hoping to do something like this:
Where
path
points to the location of a serializedSparseVector
. Is it possible to useMemoryMap
in some way withSparseVector
such that I can do something like:let sparse_vector = SparseVector::load(memory_mapped_sparse_vector)
?I'm hoping to be able to do things like
sparse_vector.rank(10)
etc without having to load in a potentially hugeSparseVector
in its entirety from disk.In addition, I'd also like to be able to append to the
SparseVector
- is this also possible to do with memory mapping where I can avoid loading the whole thing, writing to it, then serializing it back?Thanks in advance for any guidance!