kerighan / eldar

Boolean text search in Python
MIT License
44 stars 8 forks source link

Support for Boyer-Moore matching algorithm (same as what grep uses) #19

Closed distbit0 closed 1 year ago

distbit0 commented 2 years ago

Hi thanks so much for making this. It is super useful. I was wondering if you could consider supporting the Boyer-Moore search algorithm, as it is apparently what makes grep extremely fast. I have noticed grep is much faster than eldar, and found that it is probably primarily due to Boyer-Moore.

Here is a video which explains the basics of Boyer-Moore in case you are not already familiar: https://www.youtube.com/watch?v=4Xyhb72LCX4

Many thx! :)

distbit0 commented 2 years ago

https://github.com/amenezes/pybmoore This python library implementing boyer-moore may be of use... Can not guarantee it is compatible with the type of search queries eldar uses, though.

kerighan commented 1 year ago

The bottleneck when not using an Index is Python really, not so much the string search. Python's "substring in string" is actually really really fast (I tried to beat it using C++ and it's not easy). I could reimplement eldar in C++ and create Python bindings, but it seems a bit overkill for now.