Closed RNarayan73 closed 1 year ago
I think I understand why this happens (and I assume you run this on a Jupyter notebook too). I'll work on a fix for this soon.
Ok, so I'm taking back what I wrote, this isn't a bug. This behavior occurs when you run this on Jupyter notebook, and is the result of Jupyter's behavior of displaying ax
when they're provided as the final output of a block (which is the case here, as the ax
is part of the output dict
).
If you wish to force Jupyter not to display it, simply add a None
as the last line of your cell.
If this happens outside of Jupyter notebook, let me know.
@shakedzy, Thanks for your reply.
As I mentioned in my original post, I am trying to use it as a function to return a scalar for ks_abc which in turn I intend to package as a metric with make_scorer(). I will then use the metric to score and evaluate models and in hp tuning.
I can't imagine where I should add a None
to prevent the ax from rendering on Jupyter.
Would it not make sense to not output ax as part of the dict at all within the function at all if plot=False? Taking it a step further, would it not be possible for the function to not even generate ax at all if it doesn't need to plot? That would make it more efficient when used to simply return the ks_abc value, wouldn't it?
I hope I have clarified my need. In summary, while you had developed the function primarily as a plotting function, I find value in using it as a metric as per this blog https://towardsdatascience.com/the-metric-system-how-to-correctly-measure-your-model-17d3feaed6ab
I trust you will agree and advise how I can achieve this.
Regards Narayan
Hey Narayan, it will be great if you can post the function you developed here, so I can try to figure out what the issue is
Hi @shakedzy, here you go!
import dython
def ks_abc_score(y_true, y_prob):
return ks_abc(y_true, y_prob, plot=False)['abc']
Later, I wrap it as below:
make_scorer(ks_abc_score, needs_proba=True)
and use it as a metric in HP tuning.
Hope this helps.
Regards Narayan
Please provide a full script I can run
This seemed to work for me:
import matplotlib.pyplot as plt
dict_Results = nominal.associations(dataframe, plot=False)
plt.close()
When plot=True
the last line does not prevent the chart from displaying
Original source for solution.
Great package btw! Thanks!
Thanks for the suggestion, @Socvest. Your workaround does seem to arrest the problem and reduce run time significantly. I've included it below in the code for comparison but commented it out to highlight the initial problem.
Here you go @shakedzy !
from sklearn.datasets import make_classification
from sklearn.linear_model import SGDClassifier
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from dython.model_utils import ks_abc
def ks_abc_score(y_true, y_prob):
return ks_abc(y_true, y_prob, plot=False)['abc']
# res = ks_abc(y_true, y_prob, plot=False)
# plt.close()
# return res['abc']
X, y = make_classification(n_samples=100, n_features=4)
grid_search_pipe = GridSearchCV(Pipeline([('clf', SGDClassifier(loss='log_loss'))]),
param_grid={'clf__alpha': [10e-02, 10e-01]},
scoring=make_scorer(ks_abc_score, needs_proba=True),
verbose=1
)
grid_search_pipe.fit(X, y)
y_prb = grid_search_pipe.predict_proba(X)
ks_abc_score(y, y_prb[:, 1])
Note that for each of the 10 fits in the example above, it generates a ks_abc chart in jupyter, which takes up significant memory and processing time. The call to ks_abc_score function adds another chart! @Socvest's workaround does cut down on the run time and presumably memory, but makes the function to calculate a score look very awkward.
Hope this helps in getting to a fix.
Regards Narayan
hey @RNarayan73 - thanks for the script. I just ran it on Jupyter Lab, and indeed got several plots. I ran the exact same thing from a Python console and a as Python script, and got no plots at all. This again concludes that the issue here is not ks_abc
or dython
, but how Jupyter works with matplotlib
plots under the hood. So unfortunately, I will not make a fix for this, as there's nothing broken, and I too recommend using plt.close()
to fix this in Jupyter.
Hi @shakedzy thanks for looking into this. My suggestion is, Would it not make sense to only generate and output ax if plot=True? Would it not be possible for the function to not even generate ax at all if it doesn't need to plot? That would make it more efficient when used to simply return the ks_abc value, wouldn't it?
hey @RNarayan73 - the point of plot=False
is to not plot as a side-effect, but to still create the ax
and return it, either to manipulate it or to plot it later. So plot=False
does what it should.
The point here is that the issue is due to the framework used, not he library. BUT. - I did thought of a way to fix this. Keep following!
hey @RNarayan73 - install from source from this PR and let me know if it works.
verify you installed correctly by running:
import dython
print(dython.__version__)
This should yield 0.7.4.dev
Excellent, @shakedzy. It works fine now.
However, installing it via pip forced an update to numpy 1.24. I had originally installed the conda version which didn't require a numpy udpate and I hope the new version through conda won't force one either.
Regards, Narayan
Version check:
Run and copy the output:
this is strange, as I checked conda list to confirm that I have 0.7.3 installed
Describe the bug:
I would like to use the ks_abc function as a metric via make_scorer(). All I want is a scalar output for the abc value.
However, when called as a standalone function with plot=False as shown below, it still shows the plot (in addition to the abc value) in jupyter.
When called as a standalone function without plot=False argument, it shows the plot, but doesn't show the requested value from the dict.
Code to reproduce:
Error message:
There is no error message. It simply draws the plot in addition to the scalar value from the output dict even when plot=False.
Input data: