xxsds / DYNAMIC

Dynamic succinct/compressed data structures
MIT License
111 stars 21 forks source link

DYNAMIC: a succinct and compressed fully-dynamic data structures library

Contributors

The main contributors to the library are:

The library has also received many contributions from other researchers: we wish to thank Mikhail Karasikov, Erik Garrison, and Chris Barber for adding many useful features. We are also very grateful to Adam Novak, Jan Studený, Uula Ulkuniemi, and Michael R. Crusoe for useful bug reports and suggestions.

Please cite this library as:

@inproceedings{prezza2017framework,
title={A Framework of Dynamic Data Structures for String Processing},
author={Prezza, Nicola},
booktitle={International Symposium on Experimental Algorithms},
year={2017},
organization={Leibniz International Proceedings in Informatics (LIPIcs)}
}

Data structures

NEW: Delete (Remove) operations are now available on most structures! (indel = insertions + deletions)

This library offers space- and time-efficient implementations of some basic succinct/compressed dynamic data structures. DYNAMIC features:

Algorithms

The SPSI structure is the building block on which all other structures are based. This structure is implemented with cache-efficient B-trees.

TODO:

Download

git clone https://github.com/nicolaprezza/dynamic

Compile

Thre library features some example executables. To compile them, firstly create and enter a bin/ directory

mkdir bin; cd bin

Then, launch cmake as (default build type is release):

cmake ..

Finally, build the executables:

make

The above command creates the executables in the bin directory.

Usage

The header include/dynamic/dynamic.hpp contains all type definitions and is all you need to include in your code. The folder algorithms/ contains some algorithms implemented with the library's structures. This is a snapshot of dynamic.hpp:

/*
 * a succinct searchable partial sum with inserts implemented with cache-efficient
 * B trees.
 */
typedef spsi<packed_vector,256,16> packed_spsi;

/*
 * dynamic gap-encoded bitvector
 */
typedef gap_bitvector<packed_spsi> gap_bv;

/*
 * dynamic succinct bitvector (about 1.2n bits)
 */
typedef succinct_bitvector<spsi<packed_vector,8192,16> > suc_bv;

/*
 * succinct/compressed dynamic string implemented with wavelet trees.
 * user can choose (at construction time) between fixed-length / gamma / Huffman encoding of characters.
 */
typedef wt_string<suc_bv> wt_str;

/*
 * run-length encoded (RLE) string. This string uses 1 sparse bitvector
 * for all runs, one dynamic string for run heads, and sigma sparse bitvectors (one per character)
 */
typedef rle_string<gap_bv, wt_str> rle_str;

/*
 * RLE string implemented with a run-length encoded wavelet tree. Each
 * WT node is run-length encoded. 
 */
typedef wt_string<rle_str> wtrle_str;

/*
 * succinct/compressed BWT (see description of wt_str)
 */
typedef bwt<wt_str,rle_str> wt_bwt;

/*
 * run-length encoded BWT
 */
typedef bwt<rle_str,rle_str> rle_bwt;

/*
 * dynamic sparse vector: <= m*k + O(m log n/m) bits of space, where k is the maximum
 * number of bits of any integer > 0 and n is the total number of integers.
 */
typedef sparse_vector<packed_spsi,gap_bv> sparse_vec;

/*
 * dynamic succinct/entropy compressed FM index. BWT positions are
 * marked with a succinct bitvector
 *
 * roughly n*H0 + n + (n/k)*log n bits of space, where k is the SA sample  rate
 *
 */
typedef fm_index<wt_bwt, suc_bv, packed_spsi> wt_fmi;

/*
 * dynamic run-length encoded FM index. BWT positions are
 * marked with a gap-encoded bitvector.
 *
 * roughly 2.4*R*log(n/R) + 2.4 log log R + 1.2*R*log(sigma) + (n/k)*log n bits of  space, where
 * k is the SA sample rate and R is the number of runs in the BWT
 *
 */
typedef fm_index<rle_bwt, gap_bv, packed_spsi> rle_fmi;