Closed imoscovitz closed 1 year ago
Hi @imoscovitz
Just wanted to create a central checklist with the last remaining things since we're very close to finishing V0.2.
Good idea.
catnap speed optimization #4:
- Done
- (In the future, may wish to extend to prediction)
Yep, predict speed could be improved. That explains why prediction is slower than training then ;-)
BTW I've tested a third-party binning library (the one built-in sklearn HistGradientBoostingClassifier
) with IREP.
I'll do in a couple of days a pull request demonstrating it in an unittest file. I've already these tests based on the sklearn dataset I used last time :
Most of them works like a charm with IREP, but not RIPPER, but I guess you are aware of that since recent changes were focused on IREP only (n_discretize_bins not in __init__
for instance). But I spotted several cases where those tests break with IREP, and they are probably related to the dataset I chose, which highlights the need of more warnings I guess.
Provide np and iterable support for train/predict/score/recalibrate
- Done
- Perhaps more testing
Yes, definitely more testing, as my current unittest creation demonstrate lots of small bugs if you somehow change default parameters here and there.
Flexible predict dataset features
- Done: Try infer feature names if they differ from model
- "Works" even if dataset differs so long as all selected features are present, but gives lots of False predictions -- this is probably a bug
That's interesting. For now, I observed only better accuracy using what we can now call indeed flexible predict dataset features. Maybe that's just your bug ;-)
Pos class naming
- Done: If none provided, and classes are 0,1 or False,True, use 1 or True; otherwise throw error
Thanks!
Predict proba, recalibrate proba #2:
- Done
- Fixed NaN issue
- Reversed order to neg, pos
- Perhaps more testing
Yep, testing, definitely ;-)
Metaclassifier compatibility #7:
- Get/set param methods done
- GridSearchCV done
- Moved hyperparameters to init
- Question: Should
pos_class
andclass_feat
go in init too?
I'll give VotingClassifier another try and keep you updated.
- Question: Perhaps previously allowable params in fit with deprecation warning, or move them completely to init (cleaner break with the past, but could break some people's code)
I would vote for moving them completely. It's not a big deal to follow the new conventions.
Other outstanding strangeness:
- random_state is reproducible in a single notebook, but changes results when kernel is reset.
- predict on data with just selected features produces some different predictions
I'm glad you noticed too ;-) That puzzle me since a while.
Update readme/description/docstrings
- Todo
Yes, a wittgenstein.readthedocs.io site would be great, and some refreshed notebooks. I could help on that.
Hi @imoscovitz first, thanks for your Wittgenstein Lib. Do you know when you will release the new version?
@flamby,
Just wanted to create a central checklist with the last remaining things since we're very close to finishing V0.2.
Let me know what you think!
catnap speed optimization https://github.com/imoscovitz/wittgenstein/issues/4:
Provide np and iterable support for train/predict/score/recalibrate
Flexible predict dataset features
Pos class naming
Predict proba, recalibrate proba https://github.com/imoscovitz/wittgenstein/issues/2:
Metaclassifier compatibility https://github.com/imoscovitz/wittgenstein/issues/7:
pos_class
andclass_feat
go in init too?Other outstanding strangeness:
Update readme/description/docstrings