pegasystems / pega-datascientist-tools

Pega Data Scientist Tools
https://github.com/pegasystems/pega-datascientist-tools/wiki
Apache License 2.0
33 stars 24 forks source link

[WIP] Compatibility with Polars V1 #233

Open StijnKas opened 2 months ago

StijnKas commented 2 months ago

Polars V1 brought quite some breaking changes, this PR gets us ready for when it releases, and would tie our Polars version compatibility to Polars V1.0.x. Given how core Polars is to the functionality of pdstools, we can't always use the latest version, but given that we'd like to stay up to date, we'll tie ourselves to each minor version. Once a new minor version becomes available, we'll retest our changes on the increment and see if we can relieve this requirement.

As part of this PR, the main changes I had to cover were:

Also added some QOL features such as checking Pandoc version & adding TestBook as an optional dependency.

StijnKas commented 2 months ago

@operdeck on my end, the only tests not passing are the BinAggregator tests (the second and the last). Would you mind having a look at these and seeing if you can figure something out? test_overall_sym_rollup gives polars.exceptions.ComputeError: get index is out of bounds. I don't expect Polars V1 to release before EOW next week, but I'd like to have this one ready go for when it does. To test, you can just install latest pdstools, then do pip install --upgrade polars --pre.

StijnKas commented 1 month ago

@yusufuyanik1 @operdeck could you help me validate this branch on your workflows and fix any issues you may come across? We should probably get to polars V1 sooner rather than later, given we're only going to fall further behind and run into more issues down the line

StijnKas commented 2 weeks ago

@yusufuyanik1 @operdeck Could both of you look at this tomorrow? I will too, let's get it merged so we are semi-up to date again

codecov[bot] commented 1 week ago

Codecov Report

Attention: Patch coverage is 46.81529% with 167 lines in your changes missing coverage. Please review.

Project coverage is 59.68%. Comparing base (d6cc0c5) to head (37f761e). Report is 47 commits behind head on master.

Files with missing lines Patch % Lines
python/pdstools/adm/ADMDatamart.py 14.92% 114 Missing :warning:
python/pdstools/utils/cdh_utils.py 34.92% 41 Missing :warning:
python/pdstools/utils/streamlit_utils.py 0.00% 9 Missing :warning:
python/pdstools/decision_analyzer/utils.py 50.00% 1 Missing :warning:
python/pdstools/plots/plot_base.py 50.00% 1 Missing :warning:
python/pdstools/prediction/Prediction.py 93.33% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #233 +/- ## ========================================== + Coverage 59.59% 59.68% +0.09% ========================================== Files 29 29 Lines 3789 3793 +4 ========================================== + Hits 2258 2264 +6 + Misses 1531 1529 -2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

StijnKas commented 3 days ago

FYI @yusufuyanik1 , we're also getting a ton of warnings in the health check execution: even things like 'where' being used over 'filter', which I think is a relatively old change. Let's get those out of the way as well