Closed dakinggg closed 4 years ago
I don't think there will be any perceivable difference unless some of the regexes are in the loop.
Quote from the Python 3 docs for your perusal:
Should you use these module-level functions, or should you get the pattern and call its methods yourself? If you’re accessing a regex within a loop, pre-compiling it will save a few function calls. Outside of loops, there’s not much difference thanks to the internal cache.
I think that if there are more than 100 regexs (including any that are in loops like here: https://github.com/nipunsadvilkar/pySBD/blob/a2bb4510e65b1fd6c37e2162005076db0e984fb0/pysbd/abbreviation_replacer.py#L62) the cache will cycle, and regexs will have to be recompiled. Given that the number of ABBREVIATIONS alone is 188, I suspect that the cache is cycling.
Here is another regex in a loop: https://github.com/nipunsadvilkar/pySBD/blob/a2bb4510e65b1fd6c37e2162005076db0e984fb0/pysbd/lists_item_replacer.py#L115
Yes, I agree there will be such a regex within a loop. In that case, would you mind tweaking it with precompiled ones and assess the performance? I would love to assist with that.
Not sure when I will have time to do it, but I can try at some point
Cool, same here! will let you know if I happen to do this performance exercise.
Fixed #71
I am not certain of this, but I suspect there might be room for performance improvement by using
re.compile
to precompile all of the needed regexs. Otherwise they will have to be compiled regularly (once there
cache of 100 has been exceeded)