Table of Contents
This repo contains the implementation of the AnySketch data structure in Java. AnySketch is a generic data structure that can be configured to act as
AnySketch stores key-value pairs in a sparse format. AnySketch config is defined by:
When a key is added to a sketch it is mapped to a vector of indexes and vector of values. For each vector of indexes the vectors of values are aggregated with the aggregation functions defined by the config.
Let's say we want to create a counting Bloom Filter. Here is an example sketch config:
indexes {
name: "Index"
distribution {
uniform {
num_values: 10000000 # 10M
}
}
}
values {
name: "Frequency"
aggregator: SUM
distribution {
oracle {
key: "frequency"
}
}
}
Create the sketch and insert elements to it:
AnySketch anySketch = SketchProtos.toAnySketch(sketchConfig);
anySketch.insert(123456, ImmutableMap.of("frequency", 1L));
anySketch.insert(999999, ImmutableMap.of("frequency", 1L));
In this example, we did not store any values like demographics in the registers, thus the empty HashMap. AnySketch hashed the keys (123456 and 999999), distributed the hash uniformly to 10M values and put the value 1 in those two registers. In the off chance they collided to the same register, because we are using the UNIQUE aggregator, it marked that register with -1.
Here is an example where we are storing the demographics of each item:
indexes {
name: "Index"
distribution {
uniform {
num_values: 10000000 # 10M
}
}
}
values {
name: "SamplingIndicator"
aggregator: UNIQUE
distribution {
uniform {
num_values: 10000000 # 10M
}
}
}
values {
name: "Demographics"
aggregator: UNIQUE
distribution {
oracle {
key: "demographics"
}
}
Create the sketch and insert elements to it:
AnySketch anySketch = SketchProtos.toAnySketch(sketchConfig);
anySketch.insert(123456, ImmutableMap.of("demographics", 1L));
anySketch.insert(999999, ImmutableMap.of("demographics", 1L));