libCSD is a C++ library providing some different techniques for managing string dictionaries in compressed space. These approaches are inspired on the paper:
The current release is a beta version providing full functionality for string location and ID extraction. Besides, some techniques support queries by prefix and substrings. The set of techniques is described as follows:
The techniques support different parameterizations to achieve varied and competitive space/time tradeoffs.
1) For building the libcds library, type "make" within the libcds folder. 2) Once built libcds, type "make" within the root folder and libCSD is ready.
The library provides a simple command-line script for building dictionaries:
./Build
It is worth noting that "in" dictionary file must be lexicographically sorted and strings must be ended with the '\0' ASCII char.
./Build 1 h 10 geonames dicts/geo.10
Builds a HASH dictionary for "geonames" and stores it as "dicts/geo.10". The dictionary uses a hash table which hash 10% more cells than strings in "geonames" and compresses them using Huffman (h).
./Build 4 r 16 geonames dicts/geo.16
Builds a HTFC (Hu-Tucker Front-Coding) dictionary for "geonames" and stores it as "dicts/geo.16". The dictionary uses buckets of 16 strings and compresses them using Re-Pair.
The library also provides a command-line script for testing purposes:
./Test
This script supports four different
The parameter
The last
./Test r l dicts/geo.10 tests/geo.strings
Uses the testbed "tests/geo.strings" for analyzing how the dictionary stored at "dicts/geo.10" behaves for 'locate'.
./Test g 100000 dicts/geo.10 tests/geo
Uses the dictionary stored at "dicts/geo.10" for generating a basic testbed of 100,000 strings and IDs which are stored as "tests/geo".
./Test p 16 dicts/geo.10 tests/geo
Uses the dictionary stored at "dicts/geo.10" for generating a prefix testbed comprising patterns of lengths longer and shorter than 16 chars. The resulting pattern set are stored at "tests/geo".
If you find bugs or have any issue with library, please ask us. Enjoy the library and if you find it useful for your research, please cite our paper:
@article{MPBCCN16,
author = "M.A. Mart{\'{\i}}nez-Prieto and N. Brisaboa and R. C{\'a}novas
and F. Claude and G. Navarro",
title = "Practical Compressed String Dictionaries",
booktitle = "Information Systems",
year = "2016",
volume = {56},
pages = "73--108"
}