thekingofkings / dissertation

My PhD dissertation on spatial-temporal data mining.
0 stars 0 forks source link

How to derive confidence interval and p-value of coefficients from spatial regression? #1

Closed thekingofkings closed 8 years ago

thekingofkings commented 8 years ago

In standard regression

How to derive the confidence interval and p-value of coefficients?

The Gaussian distribution assumptions is made. But in more detail, what follows the Gaussian distribution. And why?

In spatial auto-correlation

The spatial auto-correlation has co-linear issue in constructing predictor variables. This violates some assumptions, so it is harder to calculate the p-value and confidence interval.

thekingofkings commented 8 years ago

How to derive the p-value of coefficients in standard LR?

The T-test is used in LR to get the coefficients p-value.

According to Wiki page T-test t = sqrt(p) Z / s, where Z and s are functions of data. Z = (Xbar - u) / (sigma / sqrt(n))

Assumptions:

The following assumptions are made in T-test:

Test slope of a regression line

t_score = ( beta^hat - beta_0 ) / SE_beta, where beta^hat is learned coefficient, beta_0 is the null hypothesis beta = 0, and SE_beta is the standard error of least-squares estimates.

To apply T-test, then the SE_beta^2 should follow a chi^2 distribution, which implies that sum of squares of residuals has a something to do with Gaussian.

Therefore, the residuals should be independent Gaussian variable.

More reference

An explicit requirements for T-test on linear regression is available here.

  • The dependent variable Y has a linear relationship to the independent variable X.
  • For each value of X, the probability distribution of Y has the same standard deviation σ.
  • For any given value of X,
    • The Y values are independent.
    • The Y values are roughly normally distributed (i.e., symmetric and unimodal). A little skewness is ok if the sample size is large.
thekingofkings commented 8 years ago

How to test significance if T-test cannot apply?

Use Monte Carlo test, permutation + cross validation score

refer to the comments for #2