juliasilge / juliasilge.com

My blog, built with blogdown and Hugo :link:
https://juliasilge.com/
41 stars 27 forks source link

Create a custom metric with tidymodels and NYC Airbnb prices | Julia Silge #37

Open utterances-bot opened 3 years ago

utterances-bot commented 3 years ago

Create a custom metric with tidymodels and NYC Airbnb prices | Julia Silge

Predict prices for Airbnb listings in NYC with a data set from a recent episode of SLICED, with a focus on two specific aspects of this model analysis: creating a custom metric to evaluate the model and combining both tabular and unstructured text data in one model.

https://juliasilge.com/blog/nyc-airbnb/

mfcava commented 3 years ago

Nice post Julia, I've also followed the Slice episode and this is one of the most appreciated part. Are you plan to make a video on tune_bayes?

juliasilge commented 3 years ago

@mfcava I'll look into that!

ghost commented 3 years ago

Where can I get the data set for this? Thank you

juliasilge commented 3 years ago

The link is in the post itself: Airbnb prices in New York City

nguyenlovesrpy commented 3 years ago

Hi,

I have a trouble, maybe I think this is from the tidymodels packages. I downloaded Tidymodels package from "CRAN" (install.packages("tidymodels")).

When trying to follow your code. I have an issue at

set.seed(123) bag_fit <- fit(bag_wf, data = nyc_train) bag_fit

And the error is

Error in UseMethod("filter") : no applicable method for 'filter' applied to an object of class "NULL"

Could you help me to fix this? Many thanks

juliasilge commented 3 years ago

@nguyenlovesrpy Hmmmm, I'm not entirely sure based only that info; do you have the most recent updated version of baguette from CRAN?

nealec commented 2 years ago

Hi Julia, thanks for these fantastic screencasts. I have a question regarding custom metrics, is it possible to build a metric using variables other than 'truth' and 'estimate'? I have searched tutorials/blogs but cannot find anything to guide me. Many thanks, Chris.

juliasilge commented 2 years ago

@nealec I'm assuming you checked out this article already. The variables don't have to be named truth and estimate in your data (here in my blog post they are called price and .pred). The yardstick infrastructure for creating a new metric does depend on using the function metric_vec_template() and friends, but you can pass in different names for arguments if you need to. Notice this usage:

metric_vec_template(
    metric_impl = mse_impl,
    truth = truth, 
    estimate = estimate,
    na_rm = na_rm,
    cls = "numeric",
    ...
  )
nealec commented 2 years ago

Thankyou for responding so swiftly Julia. I have indeed read that link it was very useful in getting the main part of the code written. What I am looking at is a metric that requires a truth, an estimate, variable x and variable y to calculate the metric.

If I may be so bold, would you be able to take a look at this link where I have posed the question with more detail; https://community.rstudio.com/t/tidymodels-custom-metric-for-multi-class-classification-yardstick-machine-learning/122648?u=nealec

Thankyou in advance, Chris.

jrosell commented 2 years ago

I don't understand why you used: test_rs <- augment(bag_fit, nyc_test)

Is it ok to use this? test_rs <- augment(bag_rs, nyc_test)

juliasilge commented 2 years ago

@jrosell It is not OK, actually. 😬 The bag_rs object is not a fitted model workflow but instead is a whole tibble of resampling results. It has metrics for fitting the workflow to each of the resamples.

jrosell commented 2 years ago

Thanks @juliasilge. To get calculate a custom metric manually on rsampling results I've just seen this article https://rsample.tidymodels.org/articles/Applications/Recipes_and_rsample.html but I wonder if collect_metrics should work too on resampling using this new custom metric.

juliasilge commented 2 years ago

@jrosell Yes, it definitely can! You will need to set a metric_set() for your resampling like in this post, with your custom metric in it.

conlelevn commented 2 years ago

@juliasilge, Hi Julia, this screencast is absolutely interesting because I can learn a lot of new things in here, just come up with some question that I like to ask:

  1. Could you remind me the meaning of argument times=25 in set_engine("rpart", times = 25)?
  2. I have learned about rlang before but in many cases I have rarely seen people use this to write a function. Compare to classical way to write a function, what is the advantage of using rlang?
juliasilge commented 2 years ago

@conlelevn You can check out the documentation for baguette to learn about what the arguments mean. As far as rlang, to create a custom metric, you write a function that needs to be able to take different variable names as arguments. I find a couple of resources helpful for this:

gunnergalactico commented 5 months ago

Hello Dr. Silge, I tried to rerun your code but run into an issue with metric_vec_template, I kept getting this error because of the soft deprecation "metric_vec_template() has been soft-deprecated as of yardstick 1.2.0. Please switch to use check_metric and yardstick_remove_missing functions."

When I replace metric_vec_template with check_metric, it has no named step for "metric_impl". Is there additional material you can suggest that I can use to improve my functional programming?

Thanks

juliasilge commented 5 months ago

@gunnergalactico Take a look at this documentation for some guidance on how to make a custom metric.