Open ebridge2 opened 6 years ago
@ebridge2 they wrote the code to be like the randomForest package, i'm not sure if that is relevant to your comment.
randomForest (I use that package and based my code for writing/training classifiers around their interface) uses predict.* S3 interface. see randomForest predict code. In randomForest, you can see there's a function called "predict.randomForest()" that takes an object
as its first parameter. this is how you extend stats::predict
, since predict()
will know to use the predict.name
associated with classifier name
so long as it exists. When you predict with randomForest
, you would just call predict(trained_model, X.new)
, NOT predict.randomForest(trained_model, X.new)
because that's how S3 works in R. In R-Rerf, we write our own non-S3 Predict()
function that does not extend the S3 method predict()
in stats
like theirs does.
As for comment about passing a variable for rank transform, that is just stylistic; that parameter is passed when training the forst so there's no reason to have to pass it twice when you can just return it as part of the trained model object (ie, in the trained output, set trained$rank=TRUE
and then do a simple check in the predict()
method on the object
) and then pass the whole model object to predict()
.
the advantage is that let's say I train a bunch of models using a bunch of packages (randomForest, glm, R-RerF) I can use the same predict(trained_model, X.new)
for all of them if you extend the S3 method properly.
This is a good direction to take and can be implemented in a way that does not affect backwards compatibility.
Agreed. I would assume this is a requirement for integration with the caret package, which is what I ultimately would like to see happen at some point.
On Mon, Jun 4, 2018 at 10:02 AM James D. Browne Jr. < notifications@github.com> wrote:
This is a good direction to take and can be implemented in a way that does not affect backwards compatibility.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/16#issuecomment-394365692, or mute the thread https://github.com/notifications/unsubscribe-auth/AJVzkekYcrqA5pwP7BODOT4aqx4CpXaDks5t5T4GgaJpZM4TgXAM .
Why?
On Tue, Jun 5, 2018, 4:21 PM Tyler Tomita notifications@github.com wrote:
Agreed. I would assume this is a requirement for integration with the caret package, which is what I ultimately would like to see happen at some point.
On Mon, Jun 4, 2018 at 10:02 AM James D. Browne Jr. < notifications@github.com> wrote:
This is a good direction to take and can be implemented in a way that does not affect backwards compatibility.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/16#issuecomment-394365692, or mute the thread < https://github.com/notifications/unsubscribe-auth/AJVzkekYcrqA5pwP7BODOT4aqx4CpXaDks5t5T4GgaJpZM4TgXAM
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/16#issuecomment-394846691, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcjGn8Hqs6czChU0o5zDsKj_gvclFks5t5ugugaJpZM4TgXAM .
Why integrate with caret? It makes model tuning much easier. It provides a uniform interface for comparing many different algorithms. Most of the well-known algorithms, including XGBoost and Rborist have been integrated. Therefore, it will increase both visibility and accessibility to users.
Looking at caret's repo, it actually doesn't seem like there is a requirement to conform to standard s3 methods.
On Tue, Jun 5, 2018 at 6:54 PM joshua vogelstein notifications@github.com wrote:
Why?
On Tue, Jun 5, 2018, 4:21 PM Tyler Tomita notifications@github.com wrote:
Agreed. I would assume this is a requirement for integration with the caret package, which is what I ultimately would like to see happen at some point.
On Mon, Jun 4, 2018 at 10:02 AM James D. Browne Jr. < notifications@github.com> wrote:
This is a good direction to take and can be implemented in a way that does not affect backwards compatibility.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/R-RerF/issues/16#issuecomment-394365692 , or mute the thread <
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/16#issuecomment-394846691, or mute the thread < https://github.com/notifications/unsubscribe-auth/AACjcjGn8Hqs6czChU0o5zDsKj_gvclFks5t5ugugaJpZM4TgXAM
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/16#issuecomment-394885863, or mute the thread https://github.com/notifications/unsubscribe-auth/AJVzkUDgkDi2WXaZHDUcbOzp5I9rw-Hyks5t5wwTgaJpZM4TgXAM .
definitely no requirement, but it will make peoples' lives easier the more
accessible the algorithm is :) S3 methods are super trivial to implement;
it would probably be sub 4 lines to change from your current Predict()
as
far as I can tell. Just add @importFrom stats predict in the roxygen,
rename your Predict()
to predict.RandomerForest(object, X, additional_hyperparams, ...)
(or whatever R-RerF training is called; here
I've assumed RandomerForest), wrap the entire trained classifier (and all
required hyperparameters related to training) in a single list (passed into
predict()
as the object
argument and make it a
structure(list(training_param1=#, training_param2=#, etc),
class="RandomerForest"), and you are done (and prediction can be done as
predict(trained_classifier, new_data, additional_args)
.
On Tue, Jun 5, 2018 at 9:27 PM, Tyler Tomita notifications@github.com wrote:
Why integrate with caret? It makes model tuning much easier. It provides a uniform interface for comparing many different algorithms. Most of the well-known algorithms, including XGBoost and Rborist have been integrated. Therefore, it will increase both visibility and accessibility to users.
Looking at caret's repo, it actually doesn't seem like there is a requirement to conform to standard s3 methods.
On Tue, Jun 5, 2018 at 6:54 PM joshua vogelstein <notifications@github.com
wrote:
Why?
On Tue, Jun 5, 2018, 4:21 PM Tyler Tomita notifications@github.com wrote:
Agreed. I would assume this is a requirement for integration with the caret package, which is what I ultimately would like to see happen at some point.
On Mon, Jun 4, 2018 at 10:02 AM James D. Browne Jr. < notifications@github.com> wrote:
This is a good direction to take and can be implemented in a way that does not affect backwards compatibility.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/R-RerF/issues/16#issuecomment- 394365692 , or mute the thread <
https://github.com/notifications/unsubscribe-auth/ AJVzkekYcrqA5pwP7BODOT4aqx4CpXaDks5t5T4GgaJpZM4TgXAM
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/neurodata/R-RerF/issues/16#issuecomment-394846691 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ AACjcjGn8Hqs6czChU0o5zDsKj_gvclFks5t5ugugaJpZM4TgXAM
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/16#issuecomment-394885863, or mute the thread https://github.com/notifications/unsubscribe-auth/ AJVzkUDgkDi2WXaZHDUcbOzp5I9rw-Hyks5t5wwTgaJpZM4TgXAM
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/16#issuecomment-394910574, or mute the thread https://github.com/notifications/unsubscribe-auth/AIeNW8UGi1qZ9b857VfF4_9TaqU0qDhVks5t5zALgaJpZM4TgXAM .
-- Eric Bridgeford ericwb.me
i see. being caret compliant seems good, as does the S3 stuff. seems like something @ttomita or @ebridge can easily do at some point.
On Tue, Jun 5, 2018 at 9:56 PM, Eric Bridgeford notifications@github.com wrote:
definitely no requirement, but it will make peoples' lives easier the more accessible the algorithm is :) S3 methods are super trivial to implement; it would probably be sub 4 lines to change from your current
Predict()
as far as I can tell. Just add @importFrom stats predict in the roxygen, rename yourPredict()
topredict.RandomerForest(object, X, additional_hyperparams, ...)
(or whatever R-RerF training is called; here I've assumed RandomerForest), wrap the entire trained classifier (and all required hyperparameters related to training) in a single list (passed intopredict()
as theobject
argument and make it a structure(list(training_param1=#, training_param2=#, etc), class="RandomerForest"), and you are done (and prediction can be done aspredict(trained_classifier, new_data, additional_args)
.On Tue, Jun 5, 2018 at 9:27 PM, Tyler Tomita notifications@github.com wrote:
Why integrate with caret? It makes model tuning much easier. It provides a uniform interface for comparing many different algorithms. Most of the well-known algorithms, including XGBoost and Rborist have been integrated. Therefore, it will increase both visibility and accessibility to users.
Looking at caret's repo, it actually doesn't seem like there is a requirement to conform to standard s3 methods.
On Tue, Jun 5, 2018 at 6:54 PM joshua vogelstein < notifications@github.com
wrote:
Why?
On Tue, Jun 5, 2018, 4:21 PM Tyler Tomita notifications@github.com wrote:
Agreed. I would assume this is a requirement for integration with the caret package, which is what I ultimately would like to see happen at some point.
On Mon, Jun 4, 2018 at 10:02 AM James D. Browne Jr. < notifications@github.com> wrote:
This is a good direction to take and can be implemented in a way that does not affect backwards compatibility.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/R-RerF/issues/16#issuecomment- 394365692 , or mute the thread <
https://github.com/notifications/unsubscribe-auth/ AJVzkekYcrqA5pwP7BODOT4aqx4CpXaDks5t5T4GgaJpZM4TgXAM
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/neurodata/R-RerF/issues/16#issuecomment- 394846691 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ AACjcjGn8Hqs6czChU0o5zDsKj_gvclFks5t5ugugaJpZM4TgXAM
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/neurodata/R-RerF/issues/16#issuecomment-394885863 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AJVzkUDgkDi2WXaZHDUcbOzp5I9rw-Hyks5t5wwTgaJpZM4TgXAM
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/16#issuecomment-394910574, or mute the thread https://github.com/notifications/unsubscribe- auth/AIeNW8UGi1qZ9b857VfF4_9TaqU0qDhVks5t5zALgaJpZM4TgXAM .
-- Eric Bridgeford ericwb.me
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/16#issuecomment-394914883, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcjxjVhABJgNeKpLALADSLyQk9Lyoks5t5zbPgaJpZM4TgXAM .
-- the glass is all full: half water, half air. neurodata.io ps - i am committed to responding to my emails, it often takes about a week. thank you for understanding.
wrap the forest, X when rank transforming, etc. anything that predict would need for determining prediction to simplify your function calls as a single class object. Then, use the S3
predict
function instead ofPredict
that you guys wrote. That is considered "standard practice" for ML techniques in R; big packages likecaret
,lmer
, etc. use it. Ie:https://github.com/neurodata/lol/blob/master/R/random_classifier.R
This classifier an be used as:
For example, if you have
rank.transform = TRUE
as specified in your docs, there's no need to require users to have to know to specify their X when predicting; you can output a "RerF" class object that either contains or doesn't contain X depending on whenrank.transform = TRUE
, and then just wrap this in whatever is output by RerF and pass that whole thing topredict
.I'll be doing some R-RerF stuff in the future, so I've assigned this to myself for now as it seems like a decent initial goal