Open MothNik opened 3 months ago
@MothNik FANTASTIC - I have been waiting for this day with a lot of enthusiasm ๐ค๐ค!!
I am starting to review it right now, and it is a long review, but I hope I can have it done in about a month from now! It is a very exciting contribution. During the review process, we could also start considering how to add the different improvements to the documentation pages :smile:
The restructure of the package is a good idea, it goes perfectly in line with #53, and I need to get done during the summer, Having the dev dependencies separated is a great starting point! Also nice to hear you have been using Ruff
for linting, it was also on my todo list to trancition from black
. I did not know about pytest-xdist
, but I have started testing it and... it is pretty cool, I like it a lot!๐
I think that now it is my turn, and I have some work to do ๐ฅณ๐ฅณ
@paucablop You are highly welcome ๐ธ
Yes, it's a lot of files. I'm sorry it turned so big ๐ Take all the time you need and just ping me for the documentation pages โ๏ธ
I usually would not do package restructuring in a feature branch, but the branch required some setup for the development environment, especially for the tests โ โ I hope this will help for #53 and also #61 and make the installation easier ๐พ
As I said, take your time ๐ธ
I want to give special credits and thanks for the support by Guillaume Biessy, the author of Revisiting Whittaker-Henderson Smoothing which is - as far as I'm aware - the best review of the Whittaker-Henderson Smoothing out there because it is illustrated very well and focuses on the key points ๐
@paucablop
I'm done with the renaming of all the variables and functions to make them more readable.
Besides, I also added a tiny cheatsheet for testing with pytest
as a README.
This pull request primariliy tackles issue #44, but it does not fully close it (see the second point of ๐ถ Next Steps). It should be squashed before merging because it's more than 100 commits.
๐๏ธ Main Feature Changes
๐งโ๐ป Implementations
WhittakerSmooth
,AirPls
, andArPls
and transitioning to the more appropriate LAPACK banded storage format. Why? BecauseD.T @ D
than what can be achieved via a sparse matrix, even though the logic becomes a bit more elaborate because symmetry has to be exploited while the redundant computations that make up most of the computation time have to be avoided.WhittakerSmooth
,AirPls
, andArPls
since they all rely on the same base algorithm and only differ in weighting strategies. Now, they all inherit from the classchemotools.utils._whittaker_base.main.WhittakerLikeSolver
that handles all the underlying math once in a centralized place. The only thing the 3 transformer classes add now is a customized weighting strategy. Each of the classes uses different access points to the solver depending on the checks/preprocessing they need to conduct.pentapy
-support that will check for availability ofpentapy
at runtime and use its high-performance solver for all scenarios where the differences order is 2 (see โฑ๏ธ Timings).โฑ๏ธ Timings
In summary, the speedup with the minimum set of dependencies is ~5x for all algorithms. However, when
pentapy
is used, the speedup can be up to 15x. Since it is used for difference order 2 and this is the standard use case, this is quite some gain. Yet, rust-based implementations seem to be even faster, so we definitely did not reach the limit here.๐
WhittakerSmooth
with difference order 1Speedup of ~5 to 6 times
๐
WhittakerSmooth
with difference order 2Without
pentapy
- Speedup of ~5 timesWith
pentapy
- Speedup of ~5 to 15 times๐ดโโ ๏ธ
ArPls
Without
pentapy
- Speedup of ~4 timesWith
pentapy
- Speedup of ~5 to 15 times๐ฉ๏ธ
AirPls
with polynomial order 1Speedup of ~12 to 5 times
๐ฉ๏ธ
AirPls
with polynomial order 2Speedup of ~12 to 5 times
With
pentapy
- Speedup of ~10 to 15 times๐ถ Next Steps
lam
-values are no longer possible. Many Whittaker smoother implementations out there suffer from this, but this is something that should be tackled by, e.g., also invoking banded QR-decomposition.๐ Additional features
Given that this was a lot of refactoring, the chance was used to enrich the
WhittakerSmooth
bysample_weight
keyword argument totransform
andfit_transform
to allow for locally weighting datapoints depending on their noise level. Basically, this makes theWhittakerSmooth
en par withArPls
andAirPls
which were already able to pass weights.lam
via maximization of the log marginal likelihood (same approach as for sklearn's GaussianProcessRegressor).chemotools.smooth.estimate_noise_stddev
orchemotools.utils.estimate_noise_stddev
).lam
viachemotools.smooth.WhittakerSmoothLambda
similar to SciPy's Bounds for specifying the bounds of parameters during optimizationschemotools.utils._banded_linalg
).๐ฆ๐ Package structure
settings.json
in the.vscode
-folder was removed from the GIT version control. Having a file that can overwrite the user's local settings (which might contain much more than just formatting and linting settings) can be quite destructive. It was replaced by asettings_template.json
that can provide the basic setup for the user.requirements.txt
were split into arequirements.txt
for the main package capabilities and arequirements-dev.txt
for the dependencies needed during development. #53 will profit from this, since then one can simply point to therequirements.txt
frompyproject.toml
without having to worry about the user accidently installingpytest
,matplotlib
, etc.Ruff
was configured in thepyproject.toml
(requiressettings.json
to haveRuff
configured). It reveals some unused imports, wrong type hints, and non-pythonic statements that should be tackled in the future.โ โ Tests
pytest-xdist
, the tests can now be run in parallel which saves quite some time. The command I always used for running the tests ispytest.mark.parametrize
, the tests were extended by running the same test on multiple input combinations. This was especially useful for transformer functionality and numerics tests.๐ชค Miscellaneous