laresbernardo / lares

Analytics & Machine Learning R Sidekick
https://laresbernardo.github.io/lares/
233 stars 49 forks source link

Keep original row names when running h2o_automl #41

Closed verganimarco97 closed 2 years ago

verganimarco97 commented 2 years ago

Hi Bernardo and congrats for lares, it is a very interesting package. When i pass a dataframe to h2o_automl it seems that the original rownames of the dataframe get lost. In my case, for example, the names of the rows represents the IDs of the customers, so it is very important to have the possibility to link the prediction to this ID. Is there any way to keep the original rownames so that when I extract the prediction with ''$scores_test they are still there? Thank you in advance

laresbernardo commented 2 years ago

Hi @verganimarco97 thanks for your feedback! Your rownames are lost because we use dplyr underneath. We could save those rownames for you in the backend and include them before returning results. For now, you could add rownames as a new column, say customer_id, and add that to the ignore parameter so it's not used as a predictor. Makes sense?

verganimarco97 commented 2 years ago

Hi @verganimarco97 thanks for your feedback! Your rownames are lost because we use dplyr underneath. We could save those rownames for you in the backend and include them before returning results. For now, you could add rownames as a new column, say customer_id, and add that to the ignore parameter so it's not used as a predictor. Makes sense?

The second option makes very sense! Which is the argument to ignore one of the variable?

laresbernardo commented 2 years ago

Ok. Then you can run something like:

df$customer_id <- rownames(df)
m <- h2o_automl(df, ..., ignore = "customer_id")

Feel free to close this issue if that worked out for you. Cheers!

verganimarco97 commented 2 years ago

It worked, thank you!