ixc / python-edtf

MIT License
53 stars 19 forks source link

Performance enhancements #73

Open ahankinson opened 3 months ago

ahankinson commented 3 months ago

It's great to see that this project has some life in it now. I have been using it for a while and maintaining my own fork with some performance enhancements.

Primarily, the main enhancements have been to enable packratting (which seems to now be enabled) and pre-declaring and compiling the regexes in the natural language processor.

If the current benchmarks are to be believed, this leads to a significant speedup. All the tests pass, so I guess it's OK?

https://rism-digital.github.io/python-edtf/dev/bench/

I've gone through and tried to integrate the latest changes, but that was a big PR so another set of eyes would be helpful.

ahankinson commented 3 months ago

Not sure why it's failing -- the tests all pass and work on my end.

ahankinson commented 3 months ago

It occurs to me that a big part of the reason for the speed-up is the use of the functools lru_cache, which will cache the result of the same input string and skip the computation of that value, instead returning only the cached value. While this is useful in real-world contexts, it's probably not so useful in benchmark runs!

ahankinson commented 2 months ago

@aweakley does this PR look OK?

ahankinson commented 2 months ago

I've done a bit of cleanup, but the parser_classes file is still quite buggy. Unfortunately I don't quite know how to trigger the bugs, but I'm at least confident that I didn't make it much worse.

aweakley commented 2 months ago

Thank you. I'll check it out.

aweakley commented 2 months ago

@ahankinson The tests are passing for me, are there particular bugs you were worried about in your comment above?

ahankinson commented 2 months ago

Yeah the tests pass but I don’t think they exercise the parser classes that much. I have the ruff checker running on that and it still highlights a bunch of stuff