paucablop / chemotools

Integrate your chemometric tools with the scikit-learn API ๐Ÿงช ๐Ÿค–
https://paucablop.github.io/chemotools/
MIT License
45 stars 6 forks source link

44 Improve `WhittakerSmooth`, `AirPls`, and `ArPls` performance #120

Open MothNik opened 3 months ago

MothNik commented 3 months ago

This pull request primariliy tackles issue #44, but it does not fully close it (see the second point of ๐Ÿšถ Next Steps). It should be squashed before merging because it's more than 100 commits.

๐Ÿ—๏ธ Main Feature Changes

๐Ÿง‘โ€๐Ÿ’ป Implementations

โฑ๏ธ Timings

In summary, the speedup with the minimum set of dependencies is ~5x for all algorithms. However, when pentapy is used, the speedup can be up to 15x. Since it is used for difference order 2 and this is the standard use case, this is quite some gain. Yet, rust-based implementations seem to be even faster, so we definitely did not reach the limit here.

๐ŸŒŠ WhittakerSmooth with difference order 1

Speedup of ~5 to 6 times lam_1-00e+02_diff_1_pentapy_False

๐ŸŒŠ WhittakerSmooth with difference order 2

Without pentapy - Speedup of ~5 times lam_1-00e+02_diff_2_pentapy_False

With pentapy - Speedup of ~5 to 15 times lam_1-00e+02_diff_2_pentapy_True

๐Ÿดโ€โ˜ ๏ธ ArPls

Without pentapy - Speedup of ~4 times lam_1-00e+10_diff_2_pentapy_False

With pentapy - Speedup of ~5 to 15 times lam_1-00e+10_diff_2_pentapy_True

๐Ÿ›ฉ๏ธ AirPls with polynomial order 1

Speedup of ~12 to 5 times lam_1-00e+03_poly_1_pentapy_False

๐Ÿ›ฉ๏ธ AirPls with polynomial order 2

Speedup of ~12 to 5 times lam_1-00e+03_poly_2_pentapy_False

With pentapy - Speedup of ~10 to 15 times lam_1-00e+03_poly_2_pentapy_True

๐Ÿšถ Next Steps

๐ŸŽ Additional features

Given that this was a lot of refactoring, the chance was used to enrich the WhittakerSmooth by

๐Ÿ“ฆ๐Ÿ“‚ Package structure

โœ…โŒ Tests

๐Ÿชค Miscellaneous

paucablop commented 3 months ago

@MothNik FANTASTIC - I have been waiting for this day with a lot of enthusiasm ๐Ÿค“๐Ÿค“!!

I am starting to review it right now, and it is a long review, but I hope I can have it done in about a month from now! It is a very exciting contribution. During the review process, we could also start considering how to add the different improvements to the documentation pages :smile:

The restructure of the package is a good idea, it goes perfectly in line with #53, and I need to get done during the summer, Having the dev dependencies separated is a great starting point! Also nice to hear you have been using Ruff for linting, it was also on my todo list to trancition from black. I did not know about pytest-xdist, but I have started testing it and... it is pretty cool, I like it a lot!๐Ÿ˜Ž

I think that now it is my turn, and I have some work to do ๐Ÿฅณ๐Ÿฅณ

MothNik commented 3 months ago

@paucablop You are highly welcome ๐Ÿ˜ธ

Yes, it's a lot of files. I'm sorry it turned so big ๐Ÿ˜… Take all the time you need and just ping me for the documentation pages โœ๏ธ

I usually would not do package restructuring in a feature branch, but the branch required some setup for the development environment, especially for the tests โœ…โŒ I hope this will help for #53 and also #61 and make the installation easier ๐Ÿ’พ

As I said, take your time ๐Ÿ˜ธ

MothNik commented 3 months ago

I want to give special credits and thanks for the support by Guillaume Biessy, the author of Revisiting Whittaker-Henderson Smoothing which is - as far as I'm aware - the best review of the Whittaker-Henderson Smoothing out there because it is illustrated very well and focuses on the key points ๐Ÿ™

MothNik commented 2 months ago

@paucablop I'm done with the renaming of all the variables and functions to make them more readable. Besides, I also added a tiny cheatsheet for testing with pytest as a README.