Open okayzed opened 4 years ago
evan suggests we can do this using bloom filters and hierarchical bloom filters. this will work for equality but not regex, as far as i can tell
An initial implementation now exists for simple equality on strings on a per block basis, but it is not using bloom filters
If filtering to a particular string and the block doesn't contain that string, we can skip aggregating that block. This might help certain use cases for redbull.
Basically, we would prioritize unpacking that string column first and then check filter against the string table.
This may or may not work well.