Version 0.2 ToDos - Githubissues

imoscovitz commented 5 years ago

@flamby,

Just wanted to create a central checklist with the last remaining things since we're very close to finishing V0.2.

Let me know what you think!

catnap speed optimization https://github.com/imoscovitz/wittgenstein/issues/4:

Done
(In the future, may wish to extend to prediction)

Provide np and iterable support for train/predict/score/recalibrate

Done
Perhaps more testing

Flexible predict dataset features

Done: Try infer feature names if they differ from model
"Works" even if dataset differs so long as all selected features are present, but gives lots of False predictions -- this is probably a bug

Pos class naming

Done: If none provided, and classes are 0,1 or False,True, use 1 or True; otherwise throw error

Predict proba, recalibrate proba https://github.com/imoscovitz/wittgenstein/issues/2:

Done
Fixed NaN issue
Reversed order to neg, pos
Perhaps more testing

Metaclassifier compatibility https://github.com/imoscovitz/wittgenstein/issues/7:

Get/set param methods done
GridSearchCV done
Moved hyperparameters to init
Question: Should pos_class and class_feat go in init too?
Question: Perhaps previously allowable params in fit with deprecation warning, or move them completely to init (cleaner break with the past, but could break some people's code)

Other outstanding strangeness:

random_state is reproducible in a single notebook, but changes results when kernel is reset.
predict on data with just selected features produces some different predictions

Update readme/description/docstrings

Todo

flamby commented 5 years ago

Hi @imoscovitz

Just wanted to create a central checklist with the last remaining things since we're very close to finishing V0.2.

Good idea.

catnap speed optimization #4:

Done

(In the future, may wish to extend to prediction)

Yep, predict speed could be improved. That explains why prediction is slower than training then ;-) BTW I've tested a third-party binning library (the one built-in sklearn HistGradientBoostingClassifier) with IREP.

I'll do in a couple of days a pull request demonstrating it in an unittest file. I've already these tests based on the sklearn dataset I used last time :

test_predict
test_predict_with_numpy_array
test_predict_with_third_party_binning_lib
test_predict_without_catnap
test_grid_search
test_predict_proba
test_recalibrate_proba
test_flexible_predict_features
test_explanation

Most of them works like a charm with IREP, but not RIPPER, but I guess you are aware of that since recent changes were focused on IREP only (n_discretize_bins not in __init__ for instance). But I spotted several cases where those tests break with IREP, and they are probably related to the dataset I chose, which highlights the need of more warnings I guess.

Provide np and iterable support for train/predict/score/recalibrate

Done

Perhaps more testing

Yes, definitely more testing, as my current unittest creation demonstrate lots of small bugs if you somehow change default parameters here and there.

Flexible predict dataset features

Done: Try infer feature names if they differ from model

"Works" even if dataset differs so long as all selected features are present, but gives lots of False predictions -- this is probably a bug

That's interesting. For now, I observed only better accuracy using what we can now call indeed flexible predict dataset features. Maybe that's just your bug ;-)

Pos class naming

Done: If none provided, and classes are 0,1 or False,True, use 1 or True; otherwise throw error

Thanks!

Predict proba, recalibrate proba #2:

Done

Fixed NaN issue

Reversed order to neg, pos

Perhaps more testing

Yep, testing, definitely ;-)

Metaclassifier compatibility #7:

Get/set param methods done

GridSearchCV done

Moved hyperparameters to init

Question: Should pos_class and class_feat go in init too?

I'll give VotingClassifier another try and keep you updated.

Question: Perhaps previously allowable params in fit with deprecation warning, or move them completely to init (cleaner break with the past, but could break some people's code)

I would vote for moving them completely. It's not a big deal to follow the new conventions.

Other outstanding strangeness:

random_state is reproducible in a single notebook, but changes results when kernel is reset.

predict on data with just selected features produces some different predictions

I'm glad you noticed too ;-) That puzzle me since a while.

Update readme/description/docstrings

Todo

Yes, a wittgenstein.readthedocs.io site would be great, and some refreshed notebooks. I could help on that.

AnEb0711 commented 4 years ago

Hi @imoscovitz first, thanks for your Wittgenstein Lib. Do you know when you will release the new version?

imoscovitz / wittgenstein

Version 0.2 ToDos #9