Closed woodbri closed 8 years ago
I've added some timing stats in src/tester/t2.cpp
and at the moment, it looks the performance bottleneck is in the search algorithm, so this is probably a lower priority.
Closing this with push 35f54f0..3c649e8 to develop. Regex are now optimized and Tokenizer runs about 5 time faster.
Currently, Lexicon::regex() returns one HUGE regex string. This might be too large for a large lexicon. There are two potential ways that this might be improved:
Tokenizer would need changes for item 2 but my thought is that a smaller regex will be more memory efficient and will get evaluated faster.