jeremysze / LPIS

Repository of code used to analyze LPIs in NYC
0 stars 0 forks source link

Updates on thesis #3

Open jeremysze opened 5 years ago

jeremysze commented 5 years ago

Replicate QGIS steps into python commands for easy "replicate-ability"

Within Stata:

jeremysze commented 5 years ago

Received traffic signal data from DOT Roseann Caruana Customer Service

jhconning commented 5 years ago

This is looking really great.

P.S. -- Just glanced at your efforts to pull in data directly via API... looks awesome.

jeremysze commented 5 years ago

The first two on the list are now ~scratched out~ because we will now connect the collisions to the traffic signal intersections. I will begin from scratch to rebuild the datasets, so that they can now "flow" through the notebooks until the panel analysis stage.

I will incorporate the cKDTree spatial example into the distance and ID capture to replace the less efficient method.

jeremysze commented 5 years ago

Utilized cKDTree! Its super fast! Next to create the "control" intersections by looking within the buffer. Outcomes are also set up.

jhconning commented 5 years ago

Utilized cKDTree! Its super fast!

Great. I'm curious, how does it compare in speed to doing the same thing via QGIS?

Next to create the "control" intersections by looking within the buffer. Outcomes are also set up.

Neat.. So how are you doing that? Are you looking for all non-LPIS intersections within a buffer region around each LPIS?

jeremysze commented 5 years ago

I get the results almost instantaneously. I usually leave QGIS to work on the distance and come back to it later. At the moment it is still looking at those within the buffer. With that I can designate a control group to each LPIS and utilize Synthetic control groups or entropy balance.

jeremysze commented 5 years ago

Don't think i can take timings in QGIS like you did in your examples.

jeremysze commented 5 years ago

Do you have any conferences to recommend for abstract submission?

jhconning commented 5 years ago

See this list for conferences that welcome student submissions. Several say for undergraduate but MA and PhD students too

jeremysze commented 5 years ago

I think I need some help thinking about how to utilize the "control" signal intersections that fall within 2000ft of a LPIS intersection.

At this moment, my data is organized as a panel with intersections, months and years. When we were exploring the data earlier, we used relative months to compare the LPIS intersections and the control intersections. At that point, we were not able to "set" a switch on date because LPIS intersections were implemented at different dates. Hence, we spoke about potentially identifying nearby intersections as control intersection.

But LPIS intersections that are close to each other have been implemented on different dates. Hence, when we identify control intersections that are within 2000ft of a LPIS intersection, we will have the same control intersections identify for LPIS 1 and LPIS 2.

Right now I have reshaped the data into a wide format unique by intersection_id, and it has variables of LPIS_IDS.

On another note, I was thinking that since I know have an official list of signal intersections and I can obtain a list of non-signal intersections. With this information, I could some sort of analysis on those intersections to find out if they should get a traffic signal controller. This idea is probably out of scope for my thesis. But it sounds really interesting to me now that I've thought about it.

jhconning commented 5 years ago

Interesting questions. Very brief stream of consciousness responses/ideas now but let me think and read further:

jeremysze commented 5 years ago

-I ran the straight diff on diff with control group being all non-LPIS signaled intersections. The results are similar to what we have seen before.

Outcome: Collision Count significant downward direction Outcome: Number of persons injured significant downward direction Outcome: Number of pedestrians injured significant downward direction Outcome: Number of cyclist injured not significant Outcome: Number of motorist injured significant downward direction

I can try to find a list of major intersections and find the distance to each signalized intersection. I think this makes a lot of sense to balance on.

jhconning commented 5 years ago

-I ran the straight diff on diff with control group being all non-LPIS signaled intersections. The results are similar to what we have seen before.

Very nice.. One question: is this a straight diff-in-diff with treatment effect estimated as a fixed additive shift? In other words are you getting an estimate that says, after LPIS there are (say) 0.19 fewer incidents in a year. Depending on how you specify the outcome variable I'm just concerned about the 'parallel' trend assumption. Standard stuff.

Suppose that there are two types of intersection in the city (H)igh accident intersections that have on average 10 accidents a year and (L) accident intersections that have 1 accident a year. Suppose they were both on a downward trend such that after 1 year we'd expect 10% fewer accidents. Then the H locations would see average accidents fall from 10 to 9 or by 1 accident per year. The L locations would be expected to see the rate fall by 0.1 from 1 to 0.9. Suppose the city now installs LPIS mostly at H sites and that the LPIS in fact has zero effect. A simple difference in difference would estimate a positive impact of the LPIS (as much as 0.9 = 1.0 - 0.1 if we compared LPIS locations to only L locations, somewhat less if the control group is a mix of H and L locations).

Taking logs of the HS variable gets around the problem as you'd be then in effect comparing percent changes (although logs presents a problem here where there are 0 accident intersections so we'd ). I think you'd thought through some of these issues in choosing other specifications.

The other way to get around this possible bias of course is to work hard to have a well matched control group... which you are also working on.

jeremysze commented 5 years ago

Right now I will have to check on the parallel trend assumption. My current diff-in-diff is estimated as a fixed additive shift.

Taking the logs of the outcome variable will be an issue because collisions do not occur every month for all intersections.

I hope the DOT replies with what variables they've considered in deciding if an signalized intersection should be a LPIS intersection.

I've also run the spatial regression. Right now its just with spregress. I need to look more into the interpretation of the spatial weights.

jeremysze commented 5 years ago

DOT Commissioner Request. Contact Robert Viola at 212-839-7752 or RViola@dot.nyc.gov. Will contact him to learn more about what they use to decide on LPIS intersections.

jeremysze commented 5 years ago

Might have figured out the issue with Street improvement shapefile. It is a multipoint shape, but geopandas don't seem to have the multipart to single part function yet.

Did the function in QGIS > Vector > Geometry Tools > Multipart to Singleparts. Will bring this into the model when I get additional information from Robert Viola about how DOT decides on LPIS intersections. Hopefully, he replies.

Working on understanding/interpretation of spatial regressions.

jhconning commented 5 years ago

So is the issue to convert a 'mutipoint' to a many single points? Is it your understanding that the points grouped into a multipoint when they were thought of as a joint intervention? Are the points tagged to indicate the type of intervention within a joint intervention? If they are clearly identified, you could potentially expand/later extend your focus to study these joint interventions (e.g. does LPIS plus speed reduction do more than just LPIS...?)..

As usual, I'm several steps behind you trying to learn some python geospatial methods. It seems one can 'explode' or otherwise iterate through the singleparts in a multipart file with fiona and shapely:

https://gis.stackexchange.com/questions/138163/exploding-multipart-features-in-qgis-using-python

https://gis.stackexchange.com/questions/112095/converting-huge-multipolygon-to-polygons

jeremysze commented 5 years ago

Yes. It seems that for the Street improvements, they have grouped several points into one point. I think that they have specific coordinates of corner renovation, light replacements, etc, which are grouped into 1 point.

I will try one of these methods so that I can have it down in code.

jeremysze commented 5 years ago

Robert Viola responded. I will dig into the document to identify the selection criteria.

Hi Jeremy- The vast majority of the LPIs have be installed following the actions of our Borough Pedestrian Safety Action Plans. They all have actions for LPIs, targeted to our Priority Geographies. http://www.nyc.gov/html/dot/html/pedestrians/ped-safety-action-plan.shtml

For example:

Significantly expand exclusive pedestrian crossing time on all Brooklyn Priority Corridors by the end of 2017 High-crash corridors for pedestrians tend to be on wide arterial streets with higher speeds and aggressively turning vehicles. DOT will address these issues by installing Leading Pedestrian Intervals (LPIs) at every feasible school crosswalk on all Brooklyn Priority Corridors. The LPI is a proven method of reducing pedestrian-vehicle conflicts at high pedestrian crash locations; it is a signal timing treatment that provides pedestrian-only walk time before vehicles, including turning vehicles, receive the green light.

Add exclusive pedestrian crossing time to all feasible Brooklyn Priority Intersections by the end of 2017 DOT will install LPIs at every feasible Brooklyn Priority Intersection by the end of 2017. As noted previously, the LPI is a signal timing treatment that provides pedestrian-only walk time before vehicles receive the green light.

jeremysze commented 5 years ago

Several new variables to control for

Using pedestrian KSI data from the last five available years (2009-2013), DOT developed a process for selecting Priority Corridors, Priority Intersections, and Priority Areas.

Pedestrian KSI data was employed in this analysis for two reasons. First, a pedestrian who has been severely injured typically departs the crash scene in an ambulance and often experiences life-changing injuries (e.g., loss of mobility, brain function, limbs). A comprehensive street safety program must address these types of pedestrian injuries as well, not just fatalities. Second, severe injuries are more numerous and less randomly dispersed than traffic fatalities. Thus, severe injuries are more useful and reliable in terms of ranking one corridor, intersection, or area.

Priority Corridors

To determine the Priority Corridors, all corridors in the Bronx were ranked on a pedestrian KSI per-mile basis. Corridors were selected from the top of this list until the cumulative number of pedestrian KSI reached half of the borough’s total.

Priority Intersections

In order to identify which of the Bronx’s 6,438 intersections have the highest need and greatest potential safety gains, DOT used an approach similar to the Priority Corridor process. DOT selected the intersections with the highest number of pedestrian KSI that cumulatively account for 15% of the borough’s total pedestrian KSI.

Priority Areas

Some of the safety issues throughout the Bronx occur systematically at an area-wide level and are not confined to a single intersection or street. To account for these areas, the pedestrian KSI crash dataset was transformed into a kernel density map—or heat map—which indicates where the density of these crashes is highest.

source

jhconning commented 5 years ago

Excellent, very useful information for understanding the program and its placement

On Mon, Nov 26, 2018 at 4:16 PM Jeremy Sze notifications@github.com wrote:

Several new variables to control for

  • population density
  • wide arterial streets
  • Midblock crossing
  • Borough Priority corridors
  • Borough Priorty intersections
  • Borough Priorty area
  • Lagged pedestrian KSI (Most ped. and cyclist fatalities and severe injuries occur in the same areas of NYC)
  • complex intersection (more than 2 roads crossing)

Using pedestrian KSI data from the last five available years (2009-2013), DOT developed a process for selecting Priority Corridors, Priority Intersections, and Priority Areas.

Pedestrian KSI data was employed in this analysis for two reasons. First, a pedestrian who has been severely injured typically departs the crash scene in an ambulance and often experiences life-changing injuries (e.g., loss of mobility, brain function, limbs). A comprehensive street safety program must address these types of pedestrian injuries as well, not just fatalities. Second, severe injuries are more numerous and less randomly dispersed than traffic fatalities. Thus, severe injuries are more useful and reliable in terms of ranking one corridor, intersection, or area.

Priority Corridors

To determine the Priority Corridors, all corridors in the Bronx were ranked on a pedestrian KSI per-mile basis. Corridors were selected from the top of this list until the cumulative number of pedestrian KSI reached half of the borough’s total.

Priority Intersections

In order to identify which of the Bronx’s 6,438 intersections have the highest need and greatest potential safety gains, DOT used an approach similar to the Priority Corridor process. DOT selected the intersections with the highest number of pedestrian KSI that cumulatively account for 15% of the borough’s total pedestrian KSI.

Priority Areas

Some of the safety issues throughout the Bronx occur systematically at an area-wide level and are not confined to a single intersection or street. To account for these areas, the pedestrian KSI crash dataset was transformed into a kernel density map—or heat map—which indicates where the density of these crashes is highest.

source http://www.nyc.gov/html/dot/downloads/pdf/ped-safety-action-plan-bronx.pdf

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jeremysze/LPIS/issues/3#issuecomment-441800870, or mute the thread https://github.com/notifications/unsubscribe-auth/AE0JH_r0GTjvXPhGbBgcFAFLsvAuEU-yks5uzFougaJpZM4X_z10 .

jeremysze commented 5 years ago

https://popfactfinder.planning.nyc.gov/#12.25/40.724/-73.9868

It seems we have population data by city blocks here. But no means to obtain it. Other data I have found are PUMAs, which have a min of 100,000 people.

Tried downloading Population & Housing Unit Counts — Blocks to see if they have population data by blocks. https://www.census.gov/geo/maps-data/data/tiger-data.html

jhconning commented 5 years ago

Census tract level data does seem to be available if you follow the info here https://gis.stackexchange.com/questions/9380/where-to-get-2010-census-block-data

I just tried getting data for Queens and was able to download (I clicked on employed civilian population b/c there was such a long list of available variables that I didn't go searching for total pop but it's presumably there). I'm blown away by how much data is available for download these days!

On Mon, Nov 26, 2018 at 6:00 PM Jeremy Sze notifications@github.com wrote:

https://popfactfinder.planning.nyc.gov/#12.25/40.724/-73.9868

It seems we have population data by city blocks here. But no means to obtain it. Other data I have found are PUMAs, which have a min of 100,000 people.

Tried downloading Population & Housing Unit Counts — Blocks to see if they have population data by blocks. https://www.census.gov/geo/maps-data/data/tiger-data.html

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jeremysze/LPIS/issues/3#issuecomment-441833271, or mute the thread https://github.com/notifications/unsubscribe-auth/AE0JHzWrorl7uGrwN4XPpFaC_KCW-YeCks5uzHJ0gaJpZM4X_z10 .

jeremysze commented 5 years ago

I've found one that goes down to block level! TIGER data ("tabblock2010_36_pophu" > "Population & Housing Unit Counts — Blocks")

Now I need to think about how to bring the information into the dataset of intersections. I wonder if it matters since I will be running a fixed effects panel.

Also I cant seem to find the shapefiles for Priority corridors, intersections and areas for each of the boroughs. Might have to do another request with DOT, or try to use raster layers (never tried before).

jeremysze commented 5 years ago

Opened another ticket to request for information from DOT. DOT-397860-W3Z1 Will probably try to email Robert as well.

jhconning commented 5 years ago

@jeremysze can add @partha-deb to have access to this github repo?

jeremysze commented 5 years ago

Added! I am running the code to calculate inverse distance matrix. Takes a really long time.

jeremysze commented 5 years ago

Hi Jeremy, Regarding your request about LPI studying. Would you please give me a call at 212-839-3370?


Danny Nguyen Director, Signals Timing Engineering NYCDOT – Traffic Operations 34-02 Queens Boulevard Long Island City, NY 11101 Tel: 212-839-3370

Heard back from the DOT. Going to call tomorrow.

jeremysze commented 5 years ago

Also I cant seem to find the shapefiles for Priority corridors, intersections and areas for each of the boroughs. Might have to do another request with DOT, or try to use raster layers (never tried before).

I actually found them after some digging! http://www.nyc.gov/html/dot/html/about/vz_datafeeds.shtml

mbaker21231 commented 5 years ago

Jeremy --

I'm not sure how you are going about calculating the inverse, but sometimes that can take way too long. Often, it is easier to compute an approximation - for which there are several methods available that only rely on matrix multiplication, which is much faster. Let me know if you need help with that.

jeremysze commented 5 years ago

@mbaker21231 I am using spmatrix create idistance I would like to try the approximation methods.

jeremysze commented 5 years ago

@mbaker21231 I found this potential solution on statalist. https://www.statalist.org/forums/forum/general-stata-discussion/general/1398532-inverse-distance-spatial-matrix-using-stata

jeremysze commented 5 years ago

@mbaker21231 got the inverse distance matrix! But I would still like to learn your approximation methods.

jeremysze commented 5 years ago

Short 10 minute call with Danny. Could not hear him too well but I got the main points.

There are 4 main areas of how they select intersections to be LPIS

He made the point that they can't implement LPIS on major intersections because they have too many turns, or there is not enough time.

DOT website's Vision Zero Report has a section that says LPIS reduce pedestrian collisions by 40%.

jeremysze commented 5 years ago

Summary of today's discussion Attendence: Prof Conning, Prof Deb, Prof Baker, Jeremy, Christian, Anjelica

Next steps

  1. Finish up the thesis portion (basic fixed effects?)
  2. Bring in school crossing data, senior area, priority intersection data
  3. Include them in the model as dummies
jeremysze commented 5 years ago

https://geodacenter.github.io/workbook/4b_dist_weights/lab4b.html Thiessen Polygons! Found a solution which we briefly discussed at Wednesday's meeting. To turn the point into a polygon in order to calculate a rook contiguity matrix. Got to find the python version of this solution.

jhconning commented 5 years ago

Cool. What's the broad strategy? The python pysal library is the main one for spatial weighting and regression. There may be an option in there. Also/alternatively from my very brief reading, if you have the road network as a network (as you could from osmnx) there seems to be an adjacency matrix option that may be relevant.

On Fri, Dec 7, 2018, 12:57 AM Jeremy Sze <notifications@github.com wrote:

https://geodacenter.github.io/workbook/4b_dist_weights/lab4b.html Thiessen Polygons! Found a solution which we briefly discussed at Wednesday's meeting. To turn the point into a polygon in order to calculate a rook contiguity matrix. Got to find the python version of this solution.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jeremysze/LPIS/issues/3#issuecomment-445131384, or mute the thread https://github.com/notifications/unsubscribe-auth/AE0JH9RSxOX2uipZYY-dSOYX_98mcts2ks5u2gM5gaJpZM4X_z10 .

jeremysze commented 5 years ago

I was looking at pysal. But it does not seem like they have the Thiessen Polygons function. I was just planning to test out GeoDa software first.

If we can do nearest line to point, I will be able to "merge" the signal intersection into the lion road shapefile. I could also convert the line into points, and merge it. Then using some lion road ID, to merge the data into the line geopandas. This way we can use osmnx to do the adjacency matrix.

jhconning commented 5 years ago

I may be misunderstanding, but aren't things simpler? In the example from this notebook describing spatial weights using pysal it looks like they start from point data and then create a spatial weights matrix

They start with point data that looks like this

image

They produce a 'rook' adjacency network

rW = ps.rook_from_shapefile(shp_path, idVariable='FIPS')

To get the weights matrix it seems you'd do:

Wmatrix, ids = rW.full()

Instead of rook it could be queen, bishop, K nearest neighbors or any other 'adjacency' criterion. Presumably a '1' would be entered in the matrix for each that is adjacent and 0 otherwise.

This plot shows the 'rook' relationship extracted:

image

The Thiessen polygons approach seems to be first tesselating the intersections and then measuring adjacency distance from the centroids of those new polygons (which will be generally different from intersections). But why do all that when you can measure from the points to begin with?

jeremysze commented 5 years ago

I think I probably was not searching for the right term in the pysal document or not understanding. But this solution look right. I am going to try it.

I've build the panel dataset with the new variables.

jhconning commented 5 years ago

Actually...the simple spatial contiguity methods may not be able to deal with grids that are not on a N-S orientation (e.g. if they are tilted say 45 degrees then the 'rook' method might find not the closest intersection but the one diagonally away on the same block.

What we need is a nearest 'connected' neighbor. I started to play with neighbors and adjacency matrix methods to the networkx representations of the street network one can get with osmnx in this notebook.
The methods would seem to work but the osmnx street network has 'too many' minor intersections (e.g one for each lane on divided large roads), so it doesn't really map well back onto the data you've worked to create.

jeremysze commented 5 years ago

Maybe if we use lion street shapefile, overlay the signalized intersection, remove the other minor (no traffic signal intersections(nodes). We can narrow it down to just the network of signalized intersections.

I was trying to run the code with a cross section of the panel, but I received an error AttributeError: 'Point' object has no attribute 'bbox'. https://github.com/jeremysze/LPIS/blob/master/analytical%20panel%20shapefile%20-%20quarter%20-%20spatial%20weights.ipynb

jhconning commented 5 years ago

I'm guessing the issue is that you might be giving it non-connected intersection points whereas, from the pysal docs) "Rook contiguity defines as neighbors any pair of polygons that share a common edge in their polygon definitions."

The original 'lion street shapefile' you mention might allow you to build the adjacency matrix:

wr=rook_from_shapefile("xxx.shp"), "POLYID")

but there would be the problem you've mentioned of many false/small/repeat intersections. As I understand things you extracted the intersections as points (which would lose you the 'edges') and then merged or otherwise removed those that were not important intersections..

So (conceptually at least) the way forward might involve somehow removing false/small/repeat intersections without losing the other points and their edges.

Related question: how is the 'Lion street shapefile' you are using different from the TIGER/Line Shapefile that's also available for NYC and that osmnx uses and loads?

The osmnx library works as a front-end for pysal to find neighbors and adjacency/contiguity relationships and it has a few methods for simplifying road networks.

To load the road network near my neighborhood:

G = ox.graph_from_point(( 40.75,-73.88), distance=750, network_type='drive')

The 'drive' network type makes sure that it's not pulling in secondary roads, and every point is an intersection etc. These could be either kept or perhaps merged while

This still leaves some intersections with two nodes very near each other (e.g. on 34th Ave close to my home... but that's because there is a garden divider between the two sides of the road. But if I'm understanding you could merge very close intersections using the clean_intersections command

osmnx.simplify.clean_intersections(G, tolerance=15, dead_ends=False)

where nodes "within this tolerance distance (in graph’s geometry’s units) will be dissolved into a single intersection." It returns the centroid of the cluster.

So if it is not too much work I wonder whether you could start from the TIGER/Line data that osmnx pulls in, simplify as above and then merge in your data on signaled intersections (since we've seen already that the OSM data seems incomplete). You'd then have some additional info on edges (e.g. which way traffic flows) but also presumably ought to be able to calculate those adjacency/contiguity matrices..

This may of course be more work than it is worth or there could be some downsides I'm not seeing, so don't feel the need to go down this path just because I ended up speculating about it all!.

jeremysze commented 5 years ago

Lion comes from NYC Dept of City Planning. Tiger comes from the US Census. According to this blog, it is not exactly clear if LION is a subset of TIGER.

I think I should be able to use TIGER from osmnx. I am going to try this out. But it would probably be after the finals.

jhconning commented 5 years ago

Yes.. after the finals please and only if you think it's worthwhile.

The osmnx library clean_intersections is illustrated in 14-clean-intersection-node-clusters.ipynb

What's returned is a geoseries but, according to the notebook, "these cleaned up intersections give us more accurate intersection counts and densities, but do not alter or integrate with the network's topology." In other words you get points without edges (but we need edges for the adjacency matrix!) I've posted a brief stack overflow comment/question to Geoff Boeing the creator of osmnx to see if he has any suggestion to a also simplify the road network.

jeremysze commented 5 years ago

I think it is definitely worthwhile since this is going to be a problem with any other similar structured analysis. Its also quite fun to solve these problems.

jeremysze commented 5 years ago

thiessen polygons

Decided to try thiessen polygons. Seems like it might be doing what we want?

jhconning commented 5 years ago

Awesome picture..I want a poster of that on my wall. .Thiessen polygons do seem to do the trick since at the end of the day they are labeling contiguity and map back onto your original intersections. And seems more straightforward than the paths I was suggesting.

On Tue, Dec 11, 2018 at 4:17 PM Jeremy Sze notifications@github.com wrote:

[image: thiessen polygons] https://user-images.githubusercontent.com/6887005/49830750-1113d400-fd60-11e8-97cd-13326a3ffdf4.PNG

Decided to try thiessen polygons. Seems like it might be doing what we want?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jeremysze/LPIS/issues/3#issuecomment-446365114, or mute the thread https://github.com/notifications/unsubscribe-auth/AE0JH6TZYj267pN5PSSqSddSIkabi2eUks5u4CDUgaJpZM4X_z10 .

jeremysze commented 5 years ago

https://github.com/jeremysze/LPIS/blob/master/Spatial_regression_qt.ipynb

I brought it into stata. Stata's graphs looks pretty decent. I will have to "reconnect" that cross section data into the panel. Or just calculate the spatial matrix, and load that matrix when I ran the spxtregress. This should technically work.

jeremysze commented 5 years ago

https://github.com/jeremysze/LPIS/blob/master/Spatial_regression_qt.ipynb

Ran it with the panel data. It seems that most people with spatial analysis calculates moran's I to determine if spatial auto-correlation exists. But most moran's I calculation is done on cross sectional data.

Since my data is a panel data, it will be more challenging.

Found a paper describes how to do bootstrap calculations to get moran's i for spatial panel data. https://www.researchgate.net/publication/262384241_Moran's_I_test_of_spatial_panel_data_model_-_Based_on_bootstrap_method https://www.sciencedirect.com/science/article/pii/S026499931400162X Apparently there is a FDB Moran's I that I should be using.