Open talegari opened 2 years ago
Unfortunately catboost (the R package) is not on CRAN 😔 which is a blocker for us being able to implement catboost methods in our packages. You can see related discussion in catboost/catboost#439.
hey Julia, step_catboost
would not depend on catboost
package. The step involves involves permutations and target encoding. Here is the python implementation of the same.
Hey @talegari 👋
That sounds great! Feel free to open an issue, and ping me if you need any help or assistance!
Hello @talegari 👋 Are you still interested opening a PR for this step? if not, then I will do it
Hey @EmilHvitfeldt ... it just fell off the radar. I will submit a PR. I am planning on these lines. Let me know if you have a different suggestion.
Amazing! That looks like a great place to start! Do you know when you will have time to work on this? No rush!
by 24th Mar
ಗುರು, ಮಾರ್ಚ್ 16, 2023 ರಂದು 09:34 ಅಪರಾಹ್ನ ಸಮಯಕ್ಕೆ ರಂದು Emil Hvitfeldt < @.***> ಅವರು ಬರೆದಿದ್ದಾರೆ:
Amazing! That looks like a great place to start! Do you know when you will have time to work on this? No rush!
— Reply to this email directly, view it on GitHub https://github.com/tidymodels/embed/issues/138#issuecomment-1472260970, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACMTTW4C6ESAZ42ZCB7WVCLW4M2Y7ANCNFSM5ZQHRD2A . You are receiving this because you were mentioned.Message ID: @.***>
hey @EmilHvitfeldt , there was an unforseen thing that stopped me working on this. This is to let you know that I am on it and will raise a PR shortly.
no problem! It might not make it into the next {embed} release, but that is fine, we can send it in later
@EmilHvitfeldt , I am one step away from raising a PR. I need your help in resolving a small issue. Here is the context:
I have implemented catboost encoder as a R6 class here:
Issue: The ce$transform()
and ar %>% bake(new_data = NULL)
give different results. How do I resolve this?
Hello @talegari Sorry for taking a while to answer.
I'm not terrible familiar with {R6} so I'm not sure how much I can help you. However, I can tell you where something might happen. In bake.step_catboost()
you have
if (!is.null(new_data)){
y_name = purrr::map_chr(object$outcome, rlang::as_name) # string
ce = object$mapping
if (y_name %in% colnames(new_data)){
new_data[[y_name]] = NULL
}
res = ce$transform(new_data)
} else {
res = ce$transform()
}
I'm assuming that you thought this was needed to deal with bake(new_data = NULL)
. This is actually not the case, the data passed to any bake method will always be a non-NULL tibble. What is happening when you call bake(new_data = NULL)
is that it extracts ar$template
and does a couple of other things. So it just extracts the data we got when running prep/bake() the first time.
Secondly, I'm sad to say since you put in a lot of effort, but I don't want to include {R6} and {checkmate} as dependencies just to include this step. If you don't want to go through the work on translating away from {R6} and {checkmate} I understand, and If you want I can take over and do the last parts.
Thanks again for all the work!
Hi Emil, I am planning to implement a
step_catboost
(on these lines). IMHO, it should belong here.Let me know if you are open for PR?