Some slide questions before midterm

Cindyyyhey commented 4 years ago

Dear Professor @LucyMcGowan ,

I have a few questions from slides about the regression part.

What does this equation mean? Is that we draw the variance-bias trade off from this equation?
I know hat means predicted values, but what does the hat matrix mean here?
I'm so confused about this. What is the sigma here? Is the Var(beta) for calculating the variance of the whole regression model, or just one predictor? Also, I don't understand the formula for SE(beta) here.
For assessing a model, why do we use RSE instead of RSS(since we always calculate RSS for a test dataset to test the model )? What is the difference between RSE and RSS?
What does the summarise(reduce variables to values) mean here?
Do we need to memorize or understand these two complicated formulas?
Why false positive rate keep decreasing and false negative keep increasing? Is this tendency for all lda models? What does the x-axis threshold mean here?
The last one!! What does the area under ROC curve mean? Which one is better, higher or lower AUC? (I think low auc is better because both false rates are low.)

Thanks a lot!!!

LucyMcGowan commented 4 years ago

What does this equation mean? Is that we draw the variance-bias trade off from this equation?

Yes

I know hat means predicted values, but what does the hat matrix mean here?

It’s the matrix that you multiply y by to turn y into y hat

I'm so confused about this. What is the sigma here? Is the Var(beta) for calculating the variance of the whole regression model, or just one predictor? Also, I don't understand the formula for SE(beta) here.

sigma is the variance of the error term var(beta) is for just the beta coefficient(s) SE(beta) is the square root of the variance - the standard error.

For assessing a model, why do we use RSE instead of RSS(since we always calculate RSS for a test dataset to test the model )? What is the difference between RSE and RSS?

RSE is a function of RSS - it accounts for the sample size and the number of predictors

What does the summarise(reduce variables to values) mean here?

the summarize function is used to summarize across multiple observations (for example to take the mean with groups specified by a group_by function)

Do we need to memorize or understand these two complicated formulas?

no

Why false positive rate keep decreasing and false negative keep increasing? Is this tendency for all lda models? What does the x-axis threshold mean here?

this is specific to the Credit Default model - the x-axis is the threshold chosen to determine which group to classify a new observation into (it’s specified on the previous slide)

The last one!! What does the area under ROC curve mean? Which one is better, higher or lower AUC? (I think low auc is better because both false rates are low.)

AUC is the area under the ROC curve. Larger is better - here is a blog post with more details

Cindyyyhey commented 4 years ago

Thanks a lot!!!!!!

sta-363-s20 / community

Some slide questions before midterm #44