Closed thekingofkings closed 8 years ago
The T-test is used in LR to get the coefficients p-value.
According to Wiki page T-test
t = sqrt(p) Z / s
,
where Z
and s
are functions of data. Z = (Xbar - u) / (sigma / sqrt(n))
The following assumptions are made in T-test:
t_score = ( beta^hat - beta_0 ) / SE_beta
, where
beta^hat
is learned coefficient, beta_0
is the null hypothesis beta = 0
, and SE_beta
is the standard error of least-squares estimates.
To apply T-test, then the SE_beta^2
should follow a chi^2
distribution, which implies that sum of squares of residuals has a something to do with Gaussian.
Therefore, the residuals should be independent Gaussian variable.
An explicit requirements for T-test on linear regression is available here.
- The dependent variable Y has a linear relationship to the independent variable X.
- For each value of X, the probability distribution of Y has the same standard deviation σ.
- For any given value of X,
- The Y values are independent.
- The Y values are roughly normally distributed (i.e., symmetric and unimodal). A little skewness is ok if the sample size is large.
Use Monte Carlo test, permutation + cross validation score
refer to the comments for #2
In standard regression
How to derive the confidence interval and p-value of coefficients?
The Gaussian distribution assumptions is made. But in more detail, what follows the Gaussian distribution. And why?
In spatial auto-correlation
The spatial auto-correlation has co-linear issue in constructing predictor variables. This violates some assumptions, so it is harder to calculate the p-value and confidence interval.