imoscovitz / wittgenstein

Ruleset covering algorithms for transparent machine learning
MIT License
90 stars 24 forks source link

Version 0.2 ToDos #9

Closed imoscovitz closed 1 year ago

imoscovitz commented 5 years ago

@flamby,

Just wanted to create a central checklist with the last remaining things since we're very close to finishing V0.2.

Let me know what you think!

catnap speed optimization https://github.com/imoscovitz/wittgenstein/issues/4:

Provide np and iterable support for train/predict/score/recalibrate

Flexible predict dataset features

Pos class naming

Predict proba, recalibrate proba https://github.com/imoscovitz/wittgenstein/issues/2:

Metaclassifier compatibility https://github.com/imoscovitz/wittgenstein/issues/7:

Other outstanding strangeness:

Update readme/description/docstrings

flamby commented 5 years ago

Hi @imoscovitz

Just wanted to create a central checklist with the last remaining things since we're very close to finishing V0.2.

Good idea.

catnap speed optimization #4:

  • Done
  • (In the future, may wish to extend to prediction)

Yep, predict speed could be improved. That explains why prediction is slower than training then ;-) BTW I've tested a third-party binning library (the one built-in sklearn HistGradientBoostingClassifier) with IREP.

I'll do in a couple of days a pull request demonstrating it in an unittest file. I've already these tests based on the sklearn dataset I used last time :

Most of them works like a charm with IREP, but not RIPPER, but I guess you are aware of that since recent changes were focused on IREP only (n_discretize_bins not in __init__ for instance). But I spotted several cases where those tests break with IREP, and they are probably related to the dataset I chose, which highlights the need of more warnings I guess.

Provide np and iterable support for train/predict/score/recalibrate

  • Done
  • Perhaps more testing

Yes, definitely more testing, as my current unittest creation demonstrate lots of small bugs if you somehow change default parameters here and there.

Flexible predict dataset features

  • Done: Try infer feature names if they differ from model
  • "Works" even if dataset differs so long as all selected features are present, but gives lots of False predictions -- this is probably a bug

That's interesting. For now, I observed only better accuracy using what we can now call indeed flexible predict dataset features. Maybe that's just your bug ;-)

Pos class naming

  • Done: If none provided, and classes are 0,1 or False,True, use 1 or True; otherwise throw error

Thanks!

Predict proba, recalibrate proba #2:

  • Done
  • Fixed NaN issue
  • Reversed order to neg, pos
  • Perhaps more testing

Yep, testing, definitely ;-)

Metaclassifier compatibility #7:

  • Get/set param methods done
  • GridSearchCV done
  • Moved hyperparameters to init
  • Question: Should pos_class and class_feat go in init too?

I'll give VotingClassifier another try and keep you updated.

  • Question: Perhaps previously allowable params in fit with deprecation warning, or move them completely to init (cleaner break with the past, but could break some people's code)

I would vote for moving them completely. It's not a big deal to follow the new conventions.

Other outstanding strangeness:

  • random_state is reproducible in a single notebook, but changes results when kernel is reset.
  • predict on data with just selected features produces some different predictions

I'm glad you noticed too ;-) That puzzle me since a while.

Update readme/description/docstrings

  • Todo

Yes, a wittgenstein.readthedocs.io site would be great, and some refreshed notebooks. I could help on that.

AnEb0711 commented 4 years ago

Hi @imoscovitz first, thanks for your Wittgenstein Lib. Do you know when you will release the new version?