sta-363-s20 / community

Discussion, Q&A, everything you want to say, formatted nicely
1 stars 0 forks source link

Lab 4 Ex 4 Lasso #56

Closed skywang0407 closed 4 years ago

skywang0407 commented 4 years ago

Hi Professor McGowan, When I am finding the lambda for the lasso model, all penalties show the same value of mean. Can you please help me check where is wrong?

Q1 Q1-1

Thanks!

skywang0407 commented 4 years ago

Do we need to change the grid of penalty?

LucyMcGowan commented 4 years ago

What is the RMSE when the penalty is 0?

LutionHan commented 4 years ago

Hey Dr. McGowan@LucyMcGowan, I might have the same issue. I get these yellow warning in my console:

! Fold08: internal: A correlation computation is required, but estimate is constant and...

And for all penaltys among 0, 1, 2 to 100, the RMSE all remains the same. The following is my code. This issue only happens on my lasso. My ridge works fine with almost the same code.

lasso_spec <- linear_reg(penalty = tune(), mixture = 1) %>% set_engine("glmnet") music_train_cv<-vfold_cv(music_train,v=10)

grid <- expand_grid(penalty = seq(0, 100, by = 1))

results_lasso <- tune_grid(lasso_spec, preprocessor = rec, grid = grid, resamples = music_train_cv)

results_lasso %>% collect_metrics() %>% filter(.metric == "rmse") %>% arrange(mean)

Any suggestion is appreciated. Thanks!

jdtrat commented 4 years ago

@LutionHan, I have the same errors with the lasso and elastic net regressions but not the ridge.

I think it may have something to do with the preprocessing since it says that "the estimate Is constant and has a 0 standard deviation, resulting in a divide by 0 error."

When defining a recipe for preprocessing, we use step_scale(all_predictors()) which divides each predictor by its standard deviation, so I'm assuming it's an issue with that, though I'm not sure why the errors do not appear for the ridge regression.

I have noticed that changing with the mixture value when defining the regression model (using linear_reg()) elicits the error when running tune_grid() but cannot figure out how exactly. A mixture of 0.01 does not yield any errors; a mixture of 0. 05 elicits errors in folds 3,5,7 but not the others; a mixture >=0.06, however, elicits that error in all folds.

As for the original question @skywang0407, I also get the same rmse values if the penalty is between 10 or 100. However, if the penalty is between 0 and 10 (@LucyMcGowan), the mean rmse changes (to some degree, see attached).

Screen Shot 2020-03-21 at 6 52 44 PM

If I had to venture a guess, this means that any lasso regression with at least a penalty of 7 performs equally well. If you want to validate this, simply check whether the mean rmse values for different penalties are equal (see below):

Screen Shot 2020-03-21 at 6 53 02 PM

@skywang0407, I hope that answers your question and--from another student's perspective-- think your code is good. @LutionHan I get the same errors as you but am not sure exactly how to deal with them. I think the code works properly either way, but would appreciate any thoughts you or @LucyMcGowan have.

tkinsella333 commented 4 years ago

@LucyMcGowan I am also having the issue regarding the following error message:

! Fold08: internal: A correlation computation is required, but estimate is constant and..

Similarly, this is not a problem for when mixture = 0 for the ridge regression, but it pops up for lasso regression. Is this "error/warning" something to worry about?

LucyMcGowan commented 4 years ago

I am still trying to get to the bottom of this, but for the time being ignore this error - I think it is because all of the estimates are the same for the penalty values provided (so there is no standard deviation)

ConnorReardon commented 4 years ago

@LucyMcGowan My means vary when I use

grid <- expand_grid(penalty = seq(0, 5, by = .5))

Are these appropriate numbers for tuning?

jdtrat commented 4 years ago

@ConnorReardon, we used seq(0,100, by = 10) in class, but I'm not sure what the correct protocol for choosing those are.

LutionHan commented 4 years ago

@LucyMcGowan However, if I ignore these warnings then I am not able to select a best penalty/mixture since all these estimates are same. Are we allowed to just conclude that "all choices of penalty/mixture make no difference at all"?

jdtrat commented 4 years ago

@LutionHan Try something like this: grid <- expand_grid(penalty = seq(0, 10, by = 1))

The estimates change for different penalties between 0 and 10.

LutionHan commented 4 years ago

@jdtrat I just tried and get the same issue. All the estimates are the same: image

jdtrat commented 4 years ago

@LutionHan what's your recipe? This is for lasso with the same code from earlier in this thread?

LutionHan commented 4 years ago

@jdtrat Here is my code: lasso_spec <- linear_reg(penalty = tune(), mixture = 1) %>% set_engine("glmnet") music_train_cv<-vfold_cv(music_train,v=10)

rec_lasso <- recipe(lat ~ ., data = music_train) %>% step_scale(lat) grid_lasso <- expand_grid(penalty = seq(0, 10, by = 1))

results_lasso <- tune_grid(lasso_spec, preprocessor = rec_lasso, grid = grid_lasso, resamples = music_train_cv)

results_lasso %>% collect_metrics() %>% filter(.metric == "rmse") %>% arrange(mean)

jdtrat commented 4 years ago

@LutionHan I think the issue is you're not scaling all predictors. Try: rec_lasso <- recipe(lat ~ ., data = music_train) %>% step_scale(all_predictors())

LutionHan commented 4 years ago

@jdtrat That works! Thank you so much!

PS. I am just wondering that, in lecture slides we use "%>%step_scale(The response variable)" and it works well, are there any difference in these two cases?

jdtrat commented 4 years ago

@LutionHan Glad it works!

I am not sure about that slide. My understanding is that step_scale applies to all the predictors in the model so they can be equally weighted. Maybe Dr. McGowan could elucidate.

LutionHan commented 4 years ago

@jdtrat Thanks anyway!

LucyMcGowan commented 4 years ago

Thank you everyone for weighing in and helping! There is a bit of an art to picking the penalty, if they are all giving the same result (which I think is what that error is referring to) then you can try to vary the penalties chosen to see if you can get some variability. If that doesn't work, you can just note the error and choose one at random since they all give the same result (and note that you have done this) @LutionHan can you link to the slide where I say that you should scale the response variable? That is probably a typo, you want to scale all of the predictors not the outcome.

LutionHan commented 4 years ago

@LucyMcGowan I am sorry that I misunderstand the slides. I check the slides again and find out that in the lecture example we only have one predictor (horsepower) so we only have to scale one variable, and I misunderstand that it is the response. Thanks for your and @jdtrat 's help!

tkinsella333 commented 4 years ago

@LucyMcGowan so when the increase in penalty yields the same RMSE, do we think that this is an R internal error (seems unlikely) or maybe the optimized coefficient values hit a local minimum at those penalty values?