Closed ianozsvald closed 5 years ago
Hey Ian,
as of right now it seems you have to
classifier_overrides = set()
df_results = discover.discover(df, classifier_overrides)
That's at least what I took from playing around with it and looking at the examples. Having classifier_overrides
default to set()
would clean up the code a bit, as it only applies to classifiers, where you'd have to declare it anyways, so having it as a default would probably be a good choice.
Setting the random state improves reproducibility of findings, maybe even defaulting to a value, fell a bit into a trap, writing about a discover-matrix, then having some different results on re-running.
Seen here but it seems I just misinterpreted the example, with None
actually being fine.
Many thanks for the contribution Jesper, I'm at a conference speaking this weekend (PyLondinium), I'll get this reviewed next week. Thank you!
That's cool. Enjoy the conference!
Cheers for the addition. Later (or you might, if you fancy), it would be sensible to pass in a kw_sklearn kwargs dict where random_state
is one of the possible parameters to pass in, but that also opens up a can of worms around how to specify metrics (accuracy is a bit crap...maybe balanced_accuracy
as a reasonable replacement?) and other options and that might take a bit of thought. Cheers!
" Two things that possibly stuck out: How do I fix the random seed? And maybe have the categorical default to an empty set to make regression problems a bit cleaner." https://www.linkedin.com/feed/update/urn%3Ali%3Aactivity%3A6543945632275546112/?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A6543945632275546112%2C6544252539892703232%29