sassoftware / python-sasctl

Python package and CLI for user-friendly integration with SAS Viya
https://sassoftware.github.io/python-sasctl
Apache License 2.0
45 stars 41 forks source link

some model's lift chart may not be correct #169

Closed wuyb28 closed 1 year ago

wuyb28 commented 1 year ago

Describe the issue A clear and concise description of the issue you're experiencing.

To Reproduce using sastcl sample to register pymodel to monthly env, and found all three dmcas_*.json files have the same size. https://github.com/sassoftware/python-sasctl/blob/master/examples/pzmm_binary_classification_model_import.ipynb

Expected behavior A clear and concise description of what you expected to happen. 36860f2296408cc7293713102a87371

Stack Trace Now I'm using the updated sasctl src and got different size of json files but some model's lift chart may not be correct. image

Version What version of sasctl are you using? 1.9.2

smlindauer commented 1 year ago

This is a known testing gap for python-sasctl, due to the difficulty of replicating or verifying the statistical plots created by SAS Viya without already having a connection to SAS Viya. At a minimum, I will see if I can fix these results for the current examples. Then I will see if I can come up with a reasonable methodology that can test these chart generation cases.

wuyb28 commented 1 year ago

Thanks for Scott's quick fix. I found some differrence from before, but still some not very make sence espacially for the GBDT of the python samples. And it also miss test chart for forest. We will have an onsite POC on this in next weeky. image image image image image

smlindauer commented 1 year ago

Hey @wuyb28,

I only pushed the code to the master branch, I didn't send out a new release yet. If you would like to use the newest code you will need to clone down the repository and install sasctl from source (go to the root directory of the cloned project and run pip install .). There are a couple of other open issues that we are working on, so we are aiming for another release by the end of this week.

wuyb28 commented 1 year ago

Hi Scott, Yes, I do use the latest source version. The old version is like following, understand you are foucs on new release. Could see this later? Thanks! image image

smlindauer commented 1 year ago

Was this from 1.9.3? If so, could I see the ROC and Lift json files that are generating these plots?

wuyb28 commented 1 year ago

GradientBoosting.zip DecisionTreeClassifier.zip RandomForest.zip

wuyb28 commented 1 year ago

Yes, I am using 1.9.3 base on anaconda 202007 for windows. Cound you provide key python package version such as anaconda, pandas and skitlean? Today I'm using linux version anaconda and got diffrent thing like this. So I think it may depend on python package version maybe. b263efdcd2c1a556f1cfd5d08da3cb2

smlindauer commented 1 year ago

@wuyb28:

The calculate_model_statistics function is working as expected, but I discovered an issue within the example notebook itself. I am working on uploading a commit which will fix the issues in notebooks and clarify how to pass data properly to calculate_model_statistics.