shakedzy / dython

A set of data tools in Python
http://shakedzy.xyz/dython/
MIT License
496 stars 102 forks source link

ks_abc when run with plot=False still plots the graph #147

Closed RNarayan73 closed 1 year ago

RNarayan73 commented 1 year ago

Version check:

Run and copy the output:

import sys, dython
print(sys.version_info)
print(dython.__version__)
sys.version_info(major=3, minor=10, micro=9, releaselevel='final', serial=0)
0.6.7

this is strange, as I checked conda list to confirm that I have 0.7.3 installed

conda list dython
# packages in environment at C:\Users\naray\Miniconda3\envs\skl_310:
#
# Name                    Version                   Build  Channel
dython                    0.7.3              pyhd8ed1ab_0    conda-forge

Describe the bug:

I would like to use the ks_abc function as a metric via make_scorer(). All I want is a scalar output for the abc value.

However, when called as a standalone function with plot=False as shown below, it still shows the plot (in addition to the abc value) in jupyter.

When called as a standalone function without plot=False argument, it shows the plot, but doesn't show the requested value from the dict.

Code to reproduce:

import dython

ks_abc(y_true, y_pred[:, CLASS_POSITIVE], plot=False)['abc']

Error message:

There is no error message. It simply draws the plot in addition to the scalar value from the output dict even when plot=False.

Input data:

shakedzy commented 1 year ago

I think I understand why this happens (and I assume you run this on a Jupyter notebook too). I'll work on a fix for this soon.

shakedzy commented 1 year ago

Ok, so I'm taking back what I wrote, this isn't a bug. This behavior occurs when you run this on Jupyter notebook, and is the result of Jupyter's behavior of displaying ax when they're provided as the final output of a block (which is the case here, as the ax is part of the output dict).

If you wish to force Jupyter not to display it, simply add a None as the last line of your cell.

If this happens outside of Jupyter notebook, let me know.

RNarayan73 commented 1 year ago

@shakedzy, Thanks for your reply.

As I mentioned in my original post, I am trying to use it as a function to return a scalar for ks_abc which in turn I intend to package as a metric with make_scorer(). I will then use the metric to score and evaluate models and in hp tuning.

I can't imagine where I should add a None to prevent the ax from rendering on Jupyter.

Would it not make sense to not output ax as part of the dict at all within the function at all if plot=False? Taking it a step further, would it not be possible for the function to not even generate ax at all if it doesn't need to plot? That would make it more efficient when used to simply return the ks_abc value, wouldn't it?

I hope I have clarified my need. In summary, while you had developed the function primarily as a plotting function, I find value in using it as a metric as per this blog https://towardsdatascience.com/the-metric-system-how-to-correctly-measure-your-model-17d3feaed6ab

I trust you will agree and advise how I can achieve this.

Regards Narayan

shakedzy commented 1 year ago

Hey Narayan, it will be great if you can post the function you developed here, so I can try to figure out what the issue is

RNarayan73 commented 1 year ago

Hi @shakedzy, here you go!

import dython

def ks_abc_score(y_true, y_prob):
    return ks_abc(y_true, y_prob, plot=False)['abc']

Later, I wrap it as below:

make_scorer(ks_abc_score, needs_proba=True)

and use it as a metric in HP tuning.

Hope this helps.

Regards Narayan

shakedzy commented 1 year ago

Please provide a full script I can run

Socvest commented 1 year ago

This seemed to work for me:

import matplotlib.pyplot as plt
dict_Results = nominal.associations(dataframe, plot=False)
plt.close()

When plot=True the last line does not prevent the chart from displaying

Original source for solution.

Great package btw! Thanks!

RNarayan73 commented 1 year ago

Thanks for the suggestion, @Socvest. Your workaround does seem to arrest the problem and reduce run time significantly. I've included it below in the code for comparison but commented it out to highlight the initial problem.

Here you go @shakedzy !

from sklearn.datasets import make_classification
from sklearn.linear_model import SGDClassifier
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from dython.model_utils import ks_abc

def ks_abc_score(y_true, y_prob):
    return ks_abc(y_true, y_prob, plot=False)['abc']
#     res = ks_abc(y_true, y_prob, plot=False)
#     plt.close()
#     return res['abc']

X, y = make_classification(n_samples=100, n_features=4)

grid_search_pipe = GridSearchCV(Pipeline([('clf', SGDClassifier(loss='log_loss'))]), 
                                param_grid={'clf__alpha': [10e-02, 10e-01]}, 
                                scoring=make_scorer(ks_abc_score, needs_proba=True), 
                                verbose=1
                               )
grid_search_pipe.fit(X, y)

y_prb = grid_search_pipe.predict_proba(X)
ks_abc_score(y, y_prb[:, 1])

Note that for each of the 10 fits in the example above, it generates a ks_abc chart in jupyter, which takes up significant memory and processing time. The call to ks_abc_score function adds another chart! @Socvest's workaround does cut down on the run time and presumably memory, but makes the function to calculate a score look very awkward.

Hope this helps in getting to a fix.

Regards Narayan

shakedzy commented 1 year ago

hey @RNarayan73 - thanks for the script. I just ran it on Jupyter Lab, and indeed got several plots. I ran the exact same thing from a Python console and a as Python script, and got no plots at all. This again concludes that the issue here is not ks_abc or dython, but how Jupyter works with matplotlib plots under the hood. So unfortunately, I will not make a fix for this, as there's nothing broken, and I too recommend using plt.close() to fix this in Jupyter.

RNarayan73 commented 1 year ago

Hi @shakedzy thanks for looking into this. My suggestion is, Would it not make sense to only generate and output ax if plot=True? Would it not be possible for the function to not even generate ax at all if it doesn't need to plot? That would make it more efficient when used to simply return the ks_abc value, wouldn't it?

shakedzy commented 1 year ago

hey @RNarayan73 - the point of plot=False is to not plot as a side-effect, but to still create the ax and return it, either to manipulate it or to plot it later. So plot=False does what it should.

The point here is that the issue is due to the framework used, not he library. BUT. - I did thought of a way to fix this. Keep following!

shakedzy commented 1 year ago

hey @RNarayan73 - install from source from this PR and let me know if it works.

verify you installed correctly by running:

import dython
print(dython.__version__)

This should yield 0.7.4.dev

RNarayan73 commented 1 year ago

Excellent, @shakedzy. It works fine now.

However, installing it via pip forced an update to numpy 1.24. I had originally installed the conda version which didn't require a numpy udpate and I hope the new version through conda won't force one either.

Regards, Narayan