jeremysze / LPIS

Repository of code used to analyze LPIs in NYC
0 stars 0 forks source link

spxtregress #6

Open jeremysze opened 5 years ago

jeremysze commented 5 years ago

I am struggling to find example of spxtregress. My google scholar search with the term yielded only 1 result, which was not helpful.

I am reading this paper which Stata follow to create their SAR models. Lee and Yu (2010a)

https://github.com/jeremysze/LPIS/blob/master/setting_up_spatialweights_1b.ipynb

@mbaker21231 had helped me interpret the first model in the above notebook. flag_LPIS decreased collision counts but increased collision counts in nearby intersections.

jeremysze commented 5 years ago

Looked through the Lee and Yu (2010a) paper, but it does not help with learning how to interpret the regression outputs.

Looking at this now Anselin, L. and A.K. Bera, 1998. Spatial dependence in linear regression models with an introduction

jeremysze commented 5 years ago

The interpretation of a significant spatial autoregressive coefficient p is not always straightfonvard. Two situations can be distinguished. In one, the significant spatial lag term indicates true contagion or substantive spatial dependence, i.e., it measures the extent of spatial spillovers. copy-catting or diffusion. This interpretation is valid when the actors under consideration match the spatial unit of observation and the spillover is the result of a theoretical model.

My first model in https://github.com/jeremysze/LPIS/blob/master/setting_up_spatialweights_1b.ipynb is specified with a spatial lag of the dependent variable. From what I've read from the Ansel and A.K, it is saying that there might be a spatial spillover effect. Exactly what Prof. Baker said.

jeremysze commented 5 years ago

Spatial error dependence may be interpreted as a nuisance (and the parameter A as a nuisance parameter) in the sense that it reflects spatial autocorrelation in measurement errors or in variables that are otherwise not crucial to the model (i.e., the "ignored" variables spillover across the spatial units of observation). It primarily causes a problem of inefficiency in the regression estimates, which may be remedied by increasing the sample size or by exploiting consistent estimates of the nuisance parameter. For example, this is the interpretation offered in the model of agricultural land values in Benirschka and Binkley (I994).

My second model includes a spatial lag of the error term. I am not sure what to do with this information. It is significant. It seems that there might be measurement error or something more probable - I have some omitted variables.

jeremysze commented 5 years ago

It seems that the "3900 unable to allocate real error" from theestat impactcommand does not have a solution that I can find from googling. I have 64gb of ram on my work computer, so I don't think that memory is an issue here. I am going to try if setting matrix size to the max will make this command work.

jhconning commented 5 years ago

Interesting explorations... I'm trying to follow while also reading this simple guide to ideas as well as while skimming Anselin and Rey's (2014) Modern Spatial Econometrics in Practice.

I think the interpretation of the spatial lag model is different from what you and Matt are saying -- and goes in the direction you want.

Your regression found two coefficients b1 = -.1158 and rho = 0.438

If I'm understanding correctly your result in effect says that we can decompose the effect of the LPIS at a given intersection as composed of two effects (Anselin and Rey p 165). The total effect of the LPIS would have this exponential multiplier decay form:

dCollision/dLPIS = b0 + rho*b0 W + rho^2 W^2 + rho^3 W^3 +....

The first term is the direct effect of adding LPIS to an intersection or b1 = -.1158

but we need to add in the cumulative indirect effects (from the multiplicative effect of other LPIS nearby). The infinite sum of terms above can be reduced to:

b1 * rho/(1-rho) = -0.0902

The latter effect is therefore in fact also negative, and the sum of the two effects (-0.206) is also negative.

One question though.. I've never used spxregress are you getting the DiD correct? I see intersection and month FE... but shouldn't there be the both an intersection and intersection*LPIS term?

jeremysze commented 5 years ago

The guide is awesome! Thanks!

I was trying to calculate the direct and indirect effects with estat impact. But they did not work. I've emailed Stata, hopefully they have an answer. Otherwise, I might have to try R.

One question though.. I've never used spxregress are you getting the DiD correct? I see intersection and month FE... but shouldn't there be the both an intersection and intersection*LPIS term?

I remember talking to Prof Deb about this issue with the LPIS intersection##post. The main issue is with the post variable. I don't have a post intervention for control intersections. He said that the indicator for when LPIS intersection becomes treated in a panel regression is an equivalent to the DiD.

I am searching for more evaluations with phased roll outs that utilizes DiD. Will get back with more on this.

jeremysze commented 5 years ago

I tried to see if setting memory in Stata helps. But it does not. They have opened up a ticket.

jeremysze commented 5 years ago

Found a paper that had a phased roll out intervention and used triple DiD by exploiting spatial and temporal variation. Can at Scale Drug Provision Improve the Health of the Targeted in Sub-Saharan Africa?

jeremysze commented 5 years ago

Dear Jeremy,

I am sorry for the delay in my reply.

The error message indicates that the command is trying to allocate a 12983 by 12983 matrix, that matrix requires 59574595748 bytes, which is about 1.35GB memory. In general, the explicit direct solution would be that you increase the memory of your computer; however, you are referring that your computer has 64GB (which seems to be more than fine), so I checked with the developer and he stated that:

" For spatial panel impacts computation, it needs the impacts
  matrix for each time period. So for NxT panel, the memory 
  needed would be NxNxT. In addition, there should be spatial
  weighting matrix. I guess this user's panel data has multiple
  time periods. "

Sincerely,

Gustavo


Gustavo Sanchez, Ph.D. Director - Tech Services tech-support@stata.com
StataCorp LLC


What they are saying is that I need at least 1.35GB*75 or 101GB of memory. I've email to see if there are other possible ways to get around. Does not seem likely to be. Do we have any computers with big memory at Hunter that we could use to do this calculation?

jhconning commented 5 years ago

That is a big demand on memory.

estat seems used to calculate the recursive impact similar to the back of the envelope calculation I described above. I can get close to values reported in examples from the Stata manual for sp regress but I'm not sure how to get the confidence intervals, or why the numbers differ somewhat.

For example on page 49 they report image

and then estat gives image

With b =-0.0939834 and rho= 0.2007728 the simple recursive formulas give

direct = -0.0940 indirect = b*rho/(1-rho) = -0.0236 Total = direct + indirect = -0.1176

Which differs slightly from their -0.11407

Maybe having extra control variables as they do somehow complicates the formula.. Not sure this is useful except as a way of knowing approximately where things may end.

Can you get the Moran statistic (to check if the spatial lag is worthwhile) or does it choke for that too?

jeremysze commented 5 years ago

I remember that Moran's I did not work. I will check again to see if it runs. I think will try the back of the envelop calculations to get a sense of the data.

jeremysze commented 5 years ago

Dear Jeremy,

I am afraid that there is not currently a different alternative. I actually checked with the developer and he confirmed that the only way you can get that output is by working with the amount of ram memory that is needed for the matrices used for the corresponding underlying computations.

Sincerely,

Gustavo


Gustavo Sanchez, Ph.D. Director - Tech Services tech-support@stata.com
StataCorp LLC


You wrote:

-----Begin Original Message----- Hi Gustavo,

Thank you for your reply. I have a panel data of 12983 units across 75 time periods. So I would need about 101.25 gb of ram to do this calculation. Apart from increasing the memory of the computer, is there anything I could do to get around this?

Jeremy

No work around available to the issue. If I have slots available in my motherboard, I could come in one weekend and open up my colleague's computer borrow the RAM and have 64*2 available to calculate. Probably wishful thinking because most likely the RAM chips are different.

jhconning commented 5 years ago

It's nice that Stata got back to you at least.

Since the problem is that NNT is just too large, I suppose one remaining way forward is to work on a sample. A random sample (or better, repeated samples sort of like bootstrapping) of size say (N/100) ought to give you similar estimates and would presumably not run out of memory space.

Alternatively, rather than monthly, maybe look at quarterly collision figures. then you'd scale your memory requirement down to NNT/3

jeremysze commented 5 years ago

I think that picking random samples might be complicated because I need to start with a shapefile each time to create the contiguity matrixs.

I think it might be better to cut down the time to quarterly as you suggested or I can break the data into boroughs that are divided by water.

jeremysze commented 5 years ago

Yes it ran!

jeremysze commented 5 years ago

https://github.com/jeremysze/LPIS/blob/master/setting_up_spatialweights_1b.ipynb https://github.com/jeremysze/LPIS/blob/master/analysis_qt_panel_3b.ipynb

I ran the outcome of collision_counts in the spatial regression. But i think i need to drop intersections where there were no collisions. (Will have to do that with the work computer, because when it runs it consumes all 64gb of memory.

I am going to run Moran's I as well. I should have ran that first.

jeremysze commented 5 years ago

re-open

jeremysze commented 5 years ago

After reading more about spatial regressions. I think contiguity matrix is the wrong matrix to use in this. I should use idistance matrix.

There are many criteria on which the construction of the spatial weights can be based. A comprehensive discussion is beyond the current scope. We focus on the two most common operational approaches and distinguish between a neighborhood relation based on the notion of contiguity and one derived from distance measures. Intrinsically, contiguity is most appropriate for geographic data expressed as polygons (so-called areal units), whereas distance is suited for point data, although in practice the distinction is not that absolute. In fact, polygon data can be represented by their centroid or central point, which then lends itself to the computation of distance. Similarly, a tessellation can be constructed for point data (e.g., Thiessen polygons), which allows for the determination of contiguity relationships between the polygons in the tesselation. Modern Spatial Econometrics in Practice

jeremysze commented 5 years ago

It seems that when I use the inverse distance matrix as a dependent spatial lag my model is not converging. spxtregress collision_ flag_LPIS $time_var, fe dvarlag(M)

 (324575 observations)
  (324575 observations used)
  (data contain 12983 panels (places) )
  (weighting matrix defines 12983 places)

Performing grid search ... finished 

Optimizing concentrated log likelihood:

Iteration 0:   log likelihood = -572857.86  
Iteration 1:   log likelihood = -572857.54  
Iteration 2:   log likelihood = -572857.54  

Optimizing unconcentrated log likelihood:

Iteration 0:   log likelihood = -572857.54  
Iteration 1:   log likelihood = -572857.54  (backed up)
jeremysze commented 5 years ago

I also tried to run moran's i. But i think it can only be run with cross sectional data. I am not able to get pass the errors.

Remarks and examples If you have not read [SP] intro 1–[SP] intro 8, you should do so before using estat moran. To use estat moran, your data must be cross-sectional Sp data. See [SP] intro 3 for instructions on how to prepare your data. Stata SAR Model Reference