tinygraph / tinygraph

Tiny graph abstractions
https://tinygraph.org
MIT License
24 stars 2 forks source link

Implement Rank/Select #14

Open daniel-j-h opened 3 years ago

daniel-j-h commented 3 years ago

We should look into implementing rank/select structures for bitset / elias-fano https://github.com/tinygraph/tinygraph/issues/13

For example have a look at

"Space-efficient, high-performance rank and select structures on uncompressed bit sequences" by Zhou, Andersen, and Kaminsky

which seems to be simple and practical.

cc @ucyo

daniel-j-h commented 3 years ago

I have added the building blocks for succinct rank/select data structures

https://github.com/tinygraph/tinygraph/blob/65770f0f8d9cf7acffcd1cb75f6dcc303bbdbd88/tinygraph-bits.h#L8-L26

uses the BMI2 instruction set on modern processors for uint64 rank/select.

From here we can implement e.g. the poppy paper above, or a different approach.

daniel-j-h commented 3 years ago

Alex Bowe has some amazing material from this domain; we should check out his writing

https://raw.githubusercontent.com/alexbowe/wavelet-paper/thesis/thesis.pdf

rank-select-1

daniel-j-h commented 3 years ago

See also this paper describing https://github.com/ot/succinct

Also this thesis going with it.

daniel-j-h commented 3 years ago

This paper https://arxiv.org/abs/1706.00990 describes the machine word select we are using for succinct data structures

https://github.com/tinygraph/tinygraph/blob/893de952a71ed765978693788f89ed6e66de9c68/tinygraph-bits.c#L27-L36

daniel-j-h commented 3 years ago

The paper http://www.cs.cmu.edu/~dga/papers/zhou-sea2013.pdf - termed "poppy" - from

"Space-efficient, high-performance rank and select structures on uncompressed bit sequences" by Zhou, Andersen, and Kaminsky

sounds very practical keeping in mind the realities of hardware architectures and cache hierarchies; they

and more importantly they also provide a simple basic rank sketch which is simple to implement, has 12.5% space overhead, and allows us then to iterate easily in the future. The first iteration builds up a single index; the second iteration improves on this to get down to 3.2% space overhead for rank.

Their main insights (for rank, since that's what I have read so far) are as follows

This means we create an index and store one 64bit uint for every 64 bytes (= 512 bits) in the original bit vector.

During rank query time we use the pre-computed index, plus the remaining bits in the last basic block.

daniel-j-h commented 3 years ago

Adding here

We already provide the machine word popcount functionality in the bits module

https://github.com/tinygraph/tinygraph/blob/893de952a71ed765978693788f89ed6e66de9c68/tinygraph-bits.c#L16-L25

and memory alignment to a 64 byte cache line we can simply do with posix_memalign(&p, 64, size).

daniel-j-h commented 4 months ago

Since opening this issue there has been research to improve on the cs-poppy solution we wanted to start with here.

An improvement on poppy, called "pasta" (see "pasta-flat" in the paper):

Engineering Compact Data Structures for Rank and Select Queries on Bit Vectors

In practice, the smallest (uncompressed) rank and select data structure cs-poppy has a space overhead of ≈ 3.51 % [Zhou et al., SEA ‘13]. Using the same overhead, we present a data structure that can answer queries up to 8 % (rank) and 16.5 % (select) faster compared with cs-poppy.

screen grabs ![pasta1](https://github.com/tinygraph/tinygraph/assets/527241/f609313f-960f-4c7f-8c58-8ad89c1008fe) ![pasta2](https://github.com/tinygraph/tinygraph/assets/527241/921cbadc-c1e9-4f05-9c3c-6ab6825db505)

An improvement on poppy and pasta altogether:

SPIDER: Improved Succinct Rank and Select Performance

However, rank and select query performance still incurs a tradeoff between query time and space. For example, Vigna [ 27 ] gives a data structure for rank queries using 25% space that is roughly 19% faster than pasta-flat, and a data structure for select queries using 12.2% space, which is roughly 65% faster than pasta-flat.

and a few tricks

screen grabs ![spider1](https://github.com/tinygraph/tinygraph/assets/527241/9d363f3e-4bd8-4c2a-a842-00ae4c3b1e55)
daniel-j-h commented 3 months ago

I have a first version ready in https://github.com/tinygraph/tinygraph/pull/41