neurodata / SPORF

This is the implementation of Sparse Projection Oblique Randomer Forest
https://neurodata.io/forests/
97 stars 46 forks source link

update max_features to accept a fraction > 1.0 #340

Closed MrAE closed 4 years ago

netlify[bot] commented 4 years ago

Deploy preview for rerf ready!

Built with commit ea93757ac578e8d5af20b2c926dc5404894d44f4

https://deploy-preview-340--rerf.netlify.com

falkben commented 4 years ago

okay, this is to enable setting mtry to something like p^2, but using probabilities instead? Seems confusing, but agree, it should definitaly be allowed to go above 1 (timing be damned).

Is this easier to read for you: self.max_features > 0? It is for me, but not sure if universal

MrAE commented 4 years ago

It was more so we could do p * 3.0 without having to specify p. It's gonna make it a lot easier when setting the configSpace for BOHB.

falkben commented 4 years ago

It was more so we could do p * 3.0 without having to specify p. It's gonna make it a lot easier when setting the configSpace for BOHB.

Hmm... what would the indended outcome be of p * 3.0? p * 3.0 would, internally, be recognized as a percentage equal to 3 number of features. For instance, with 20 features, `p 3.0` would give you 6000% mtrys (or, in numeric terms, p * 60 or 1200 mtry's). This is why I said it might be confusing. Also... unclear you'd ever want > 1 for basic random forest, perhaps there should be a check for that?

If you want p * 3 mtry's (in my example above, 60 mtrys, you can just do that. Just make sure it's not a float, and internally would be recognized as the number of features (not percentage).

falkben commented 4 years ago

Nevermind, I take it back. I now see in Python we cast to an int before sending into Cpp.

https://github.com/neurodata/SPORF/blob/ea93757ac578e8d5af20b2c926dc5404894d44f4/Python/rerf/rerfClassifier.py#L321