Term accelerated searches using bloomfilter

elliVM commented 5 months ago

Allow search string pattern to be accelerated without using a global bloomfilter

elliVM commented 5 months ago

Added pattern table to bloomdb can be used to select specific filter. Pattern is stored a it's own bloom filter byte array, when a search term is included in the saved patterns (using bloommatch udf) it will activate bloom search using filter of that pattern.

For simplicity when a filter is created it will be assigned a single pattern that the search term is matched against. later this can be changed to multiple patterns per filter and vice versa.

elliVM commented 5 months ago

Changing to support multiple patterns per filter

elliVM commented 5 months ago

Implemented a schema with a pattern table and a junction table between patterns and filters,. Condition walker selects only filters with pattern match with search term and run UDF bloommatch for temp table filters generated from filter types and search term.

Add testing
Disable bloom if no filters found

elliVM commented 5 months ago

Changes to be made:

[x] Only one pattern per filter is needed remove junction table. Move pattern to filtertype
[x] Change to use regex for matching matching instead of UDF, start first without tokenization.
[x] Update schema, move pattern to filtertype table as a datatype that can use regex.
[x] Later tokenize search term before matching.

elliVM commented 5 months ago

Testing version with pattern matching against tokenized search terms

elliVM commented 4 months ago

New changes to be made

[x] Create a database table for each stored regex pattern with a bloom filter
[x] Join all filter tables that have a regex pattern match with incoming archive search term
[x] Run bloommatch UDF for each logfile and select those that match any of the joined filters
[x] To run bloommatch, create a temp table for each bloom filter table that has a pattern match

elliVM commented 4 months ago

Created a new walker that finds all dynamic bloomfilter tables that have a pattern match with the tokenized search term, will use this to select the tables for join with the main query. (Combined with Condition Walker)

elliVM commented 4 months ago

Created classes to hold dynamic tables and temp tables

elliVM commented 4 months ago

Internal PR

elliVM commented 3 months ago

updates to filtertype table: pattern varchar value increased to 2048 and pattern added to unique composite index

elliVM commented 2 months ago

Fixed issues with filter size selection in temp tables generated for bloommatch condition. Limited tokenizers to use only major tokens to match with dpf_03.

Working in QA with working filtering (pth-07 5.3.0-22-gbd5da88a)

Test example index=alert_examples earliest=-999d "c3468f80-4273-4867-9b66-3f470787c365"

without bloom took 16-18s with bloom 3-6s