Questions for Midterm 2

skywang0407 commented 4 years ago

Hi Lucy, I want to ask you some questions:

Do I understand it right: when =0, linear regression; when goes to infinity, it goes down?
Do we need to standardize variables ourselves in exam?
Is it estimating training error?
Can you explain "scale the variables" again? Is it because we want all the betas on the same scale to be comparable? And what will happen if we scale before doing CV?
For the step functions, does it mean for X <35, 35<X<65, X>65, their related Y values are the same on the straight line of Y=# ?

Thanks!

skywang0407 commented 4 years ago

Q3 is Page 17 in Slides tidymodels.

mohammad-r-k commented 4 years ago

When Lambda goes to infinity, each term which have lambda becomes very big, but since this terms are inverted in this equation, they become very small (near to zero) and by multiplying these terms to other terms, make the whole equation equal to zero, so variance become zero.

2) I don't know if we have to compute it in exam or not, but all you have to do is to divide the variables by the standard deviation.

3) I think it estimating training root mean square error.

4) a) Since we apply same Lambda to each beta coefficient (we use same penalty term for all betas), we want all the beta values have the same scale, so we want X to be scaled by the standard deviation, so the beta can be comparable.

b) No, they are not the same, we should scale after cross validation. For scaling, we should divide by SD of each fold.

Yes, for each specified interval of X, Y's are the same!

LucyMcGowan commented 4 years ago

Thank you @mohammad-r-k! I agree with all of the above, if you had to scale a variable on the exam I would provide the standard deviation.

tkinsella333 commented 4 years ago

@LucyMcGowan you would need to subtract the mean and then divide by the SD right?

tkinsella333 commented 4 years ago

nevermind, just dividing by the SD is sufficient. (and I believe subtracting by the mean and then dividing can at least be ok in cases where we assume the variable is Gaussian? or this works possibly in all cases as well? )

LucyMcGowan commented 4 years ago

@tkinsella333 yep! you just need to scale, centering is fine, it doesn’t hurt but not necessary

skywang0407 commented 4 years ago

Got it, thank you all!

sta-363-s20 / community

Questions for Midterm 2 #68