scoder / acora

Fast multi-keyword search engine for text strings
http://pypi.python.org/pypi/acora
BSD 3-Clause "New" or "Revised" License
247 stars 17 forks source link

Building mildly deep automatons takes a long time #10

Open pombredanne opened 8 years ago

pombredanne commented 8 years ago

With this snippet and the latest 2.0, which creates an automaton with 1000 strings of 2000 characters each build() takes forever to complete, I eventually killed it:

>>> from array import array
>>> from acora import AcoraBuilder
>>> tks =[array('h', range(x, x+1000)).tostring() for x in range(1000)]
>>> builder = AcoraBuilder(*tks)
>>> ac=builder.build()
pombredanne commented 8 years ago

note this is a follow up on #6

pombredanne commented 8 years ago

FWIW, the building of an automaton in @WojciechMula 's https://github.com/WojciechMula/pyahocorasick/blob/master/py/pyahocorasick.py (not even the C implementation) is much much faster.