juliasilge / juliasilge.com

My blog, built with blogdown and Hugo :link:
https://juliasilge.com/
40 stars 27 forks source link

Predict housing prices in Austin TX with tidymodels and xgboost | Julia Silge #45

Open utterances-bot opened 2 years ago

utterances-bot commented 2 years ago

Predict housing prices in Austin TX with tidymodels and xgboost | Julia Silge

More xgboost with tidymodels! Learn about feature engineering to incorporate text information as indicator variables for boosted trees.

https://juliasilge.com/blog/austin-housing/

daver787 commented 2 years ago

In the GLM inside of the mapping function I'm unclear what you are trying to predict. could you explain it?

juliasilge commented 2 years ago

@daver787 Check out the help for glm(). It says:

For binomial and quasibinomial families the response can also be specified as ... a two-column matrix with the columns giving the numbers of successes and failures.

This is what we're doing here. We're building a generalized linear model with a binomial link function, and we're predicting successes/failures (here, counts of a specific word out of the total of all the words) as a function of price.

nguyenlovesrpy commented 2 years ago

hi, I received an error like this " There were no valid metrics for the ANOVA model.", when trying to do this tutorial.

Could you find the way to solve it?

juliasilge commented 2 years ago

@nguyenlovesrpy I would make sure you have updated versions of finetune from CRAN. Can you run the basic tune_race_anova() example?

nguyenlovesrpy commented 2 years ago

I just write the code "install.packages("finetune")". Thank you for repling

SimonMontfort commented 2 years ago

I get the same error as @nguyenlovesrpy on a binary instead of a multinomial classification problem. However, using the tunegrid function it works fine for me.

SimonMontfort commented 2 years ago

where can I downlaod the train.csv?

juliasilge commented 2 years ago

@SimonMontfort It is available at the link in the blog post.

FelixZhao123 commented 2 years ago

Hi @juliasilge, I am stuck at the mapped glm function with a cbind as y part of the formula. Normally, the glm function needs a binary y (1,0), how would cbind(n, price_total) work that way? and i thought cbind would return a dataframe instead of a vector. Would appreciate a lot if you can explain a little more. Thanks!

FelixZhao123 commented 2 years ago

please ignore my previous question. it is already been asked...

edgar595 commented 2 years ago

Good content and your work is inspiring, I'm trying to follow the steps and I'd like your help in this problem: Computation failed in stat_summary_hex():

juliasilge commented 2 years ago

@edgar595 Can you create a reprex (a minimal reproducible example) for your problem? The goal of a reprex is to make it easier for people to recreate your problem so that we can understand it and/or fix it. If you've never heard of a reprex before, you may want to start with the tidyverse.org help page. Once you have a reprex, I recommend posting on RStudio Community, which is a great forum for getting help with these kinds of modeling questions. Thanks! 🙌

conlelevn commented 1 year ago

For those who get error message:

Error in test_parameters_gls(): ! There were no valid metrics for the ANOVA model. Run rlang::last_error() to see where the error occurred.

It because one of features is not numeric so the algorithm will not accept that, i.e the hasSpa variable should be in numeric rather than boolean type. You can fix it by adding one more step in recipe: step_integer(hasSpa).

I dont know why Julia can run it without transform the variable to numeric type but after I have done that, it works for me.

@juliasilge, if I want to transform the boolean type to numeric type, is the step_integer() acceptable or do we have another solution to solve it in recipe?

juliasilge commented 1 year ago

@conlelevn Hmmmm, perhaps something has changed in recipes since I did this post. I would probably make a transformation like this one before starting a recipe (like before initial_split()) but I think that step_integer() would also work.