pysal / spreg

Spatial econometric regression in Python
https://pysal.org/spreg/
Other
67 stars 23 forks source link

Can I use Spatial lag or Spatial Error for prediction? #73

Closed Cheetos closed 1 year ago

Cheetos commented 3 years ago

Hello,

I am a SDE that is currently working on Spatial analysis and I have found pysal very useful in my daily work, so thank you for all your contributions. I've found examples of how use Spatial Lag and Spatial Error regression models using the spreg library (GM_Error and GM_Lag), but I haven't found one using those models to predict the target variable of new data points, different from the training set. Is it possible to do this? If it is, if you can provide an example of how to do that it would be very useful.

Thank you, David

willgeary commented 2 years ago

Very curious about this as well!

ljwolf commented 2 years ago

Hi, sorry just getting to this. You can do in-sample prediction just fine with either the spatial error models by doing X_new @ regression.betas. This works because the error spatial lag model predictions are the same as an OLS prediction.

For the spatial lag of Y (SLY) model, this is much more challenging, and (to my knowledge) is not possible with the methods we have implemented. I'm sorry!

It is possible (in theory) by treating the out of sample data as "missing," building a full weights matrix describing all the data, and then estimating the lag model using the EM algorithm. See, for example, this recent paper for a discussion.

If prediction is chiefly of interest, you could try fitting the equivalent SLX model using spreg.OLS. Then, to predict out of sample, you would:

  1. build a weights matrix for all of the data (both in and out of sample) and stack the new data onto the bottom of the old data.
  2. synthesize the new "spatial lag of X" matrix by computing the spatial lag of the full-data X matrix according to the full-data W matrix, then slicing off the rows corresponding to the "new" data to make "WX_new"
  3. Predict out of sample using numpy.column_stack((X_new, WX_new)) @ regression.betas.

This still suffers from the fact that the model depends on the out-of-sample data, but this problem is likely less serious for the SLX model than the SLY model.

Cheetos commented 2 years ago

Thanks, @ljwolf. This is very helpful.