pysal / spreg

Spatial econometric regression in Python
https://pysal.org/spreg/
Other
67 stars 23 forks source link

Possibilities of adding dummy variables to spatial panel model? #79

Closed shuai-zhou closed 3 years ago

shuai-zhou commented 3 years ago

If my understanding is right, the spatial model specification needs the dataset to be in "wide" format, for example, the fixed effects spatial lag model, then how can I add a dummy variable, for instance, a year dummy into the model? Thanks.

pedrovma commented 3 years ago

Hi @shuai-zhou , I've just updated the FE_Panel notebook to show how you can use data in both formats, 'wide' or 'long'. The function accepts both types of data. I hope this helps.

shuai-zhou commented 3 years ago

Hi, @pedrovma,

Thank you so much for your prompt response. The "baltimore" data is in itself a cross-sectional dataset, I think the "NAT" data can be a good example in implementing the panel model with year dummy variables. And the "long" data format looks like in the following table. My question is, how can I fit a fixed effects spatial lag model with year dummy variable like: hr ~ rd + ps + year_dum. I compiled the "long" data format for you HERE, feel free to compile the data that you think will do the work. Thanks.

name fips fipsno hr rd ps geometry year year_dum
Lake of the Woods 27077 27077 0.000000 -0.196536 -1.462559 POLYGON ((...)) 1970 1
Ferry 53019 53019 0.000000 -0.847856 -1.697720 POLYGON ((...)) 1970 1
Stevens 53065 53065 1.915158 -0.225283 -0.591883 POLYGON ((...)) 1970 1
Okanogan 53047 53047 1.288643 -0.391126 -0.552016 POLYGON ((...)) 1970 1
Pend Oreille 53051 53051 0.000000 -0.451457 -1.181754 POLYGON ((...)) 1970 1
... ... ... ... ... ... ... ... ...
Lake of the Woods ... ... ... ... ... ... 1980 2
Ferry ... ... ... ... ... ... 1980 2
Stevens ... ... ... ... ... ... 1980 2
Okanogan ... ... ... ... ... ... 1980 2
Pend Oreille ... ... ... ... ... ... 1980 2
... ... ... ... ... ... ... ... ...
Lake of the Woods ... ... ... ... ... ... 1990 3
Ferry ... ... ... ... ... ... 1990 3
Stevens ... ... ... ... ... ... 1990 3
Okanogan ... ... ... ... ... ... 1990 3
Pend Oreille ... ... ... ... ... ... 1990 3
pedrovma commented 3 years ago

Hi @shuai-zhou ,

You can just add the dummies as individual X variables for each of the years in your data (minus 1, the reference category).

Example:

import libpysal
import spreg
import geopandas as gpd
import pandas as pd

data = gpd.read_file('nat_long.shp')
data = pd.get_dummies(data, columns=['year']) #This will create the dummies in the dataframe

y = data[['hr']]
x = data[['rd','ps','year_1980','year_1990']] #year_1970 will be the reference category here

w = libpysal.weights.KNN.from_dataframe(data.iloc[0:3085,:],k=10) #W must still be a NxN matrix.
w.transform = 'r'

fe_lag = spreg.Panel_FE_Lag(y.to_numpy().reshape((data.shape[0],1)), x.to_numpy(),
                            w, name_y=list(y.columns), name_x=list(x.columns), name_ds="nat_long.shp")
print(fe_lag.summary)
shuai-zhou commented 3 years ago

Hi, @pedrovma:

Awesome! This is a great example of implementing spatial panel models with dummy variables. Thanks.