pulearn / pulearn

Positive-unlabeled learning with Python.
https://pulearn.github.io/pulearn/
BSD 3-Clause "New" or "Revised" License
218 stars 33 forks source link

predict_proba returns values > 1 #15

Open sanderland opened 3 years ago

sanderland commented 3 years ago

This is strange if it's a probability, but the implementation seems to be correct looking at the paper. Maybe there should be a warning in the documentation?

shaypal5 commented 3 years ago

Hmmm. Not sure if this should be smoothed out or not.

shaypal5 commented 3 years ago

Is this true for both classifiers?

sanderland commented 3 years ago

I've only tried ElkanotoPuClassifier for now, will keep you in the loop.

sanderland commented 3 years ago

weighted can also give this, but seems to do so less often. bagging seems fine in this respect, but it's predict_proba gives 2d arrays when the same thing gives 1d arrays on the other classifiers

harc007 commented 2 years ago

I have seen this recently. The threshold for separation of the classes seems to be 0.5 though.

mepland commented 1 year ago

I have also observed c < 1 -> 1 < proba in pratice from Elkanoto, but have not seen similar issues with bagging.

Perhaps the package should warn the user if it is returning a "probability" of > 1 and suggest smoothing? Or as @shaypal5 said, smoothing with something like a sigmoid could be applied by default.