hackforla / data-science

The Hack For LA Data Science team is a Community of Practice within the LA brigade seeking to make analytical and machine learning services available to local communities and organizations.
26 stars 15 forks source link

CoP: Data Science: Analyze correlations between metro locations and 311-data requests #107

Open ryanmswan opened 4 years ago

ryanmswan commented 4 years ago

Overview

Investigate whether there are meaningful trends associated with metro stops and metro lines with regards to requests tracked by 311-data in LA County.

Action Items

Resources

Information about 311 Data here Access 311 data here http://geohub.lacity.org/datasets/metro-rail-lines-stops https://developer.metro.net/docs/gis-data/overview/ District types issue: https://github.com/hackforla/data-science/issues/118

use 2019 data for 311 streetlights crime metrostops

tools google colab, sklearn, pandas

Work in progress

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in X days.

akhaleghi commented 2 years ago

@priyakalyan please document the following update to this issue in the comments here

Progress: "What is the current status of your project? What have you completed and what is left to do?" Blockers: "Difficulties or errors encountered." Availability: "How much time will you have this week to work on this issue?" ETA: "When do you expect this issue to be completed?" Pictures (if necessary): "Add any pictures that will help illustrate what you are working on."

priyakalyan commented 2 years ago

Progress: I added this file Progress summary 311 Data Project and data dictionary for 311 data Value Column 311 data and Metro rail and bus line Value column Metro- Bus and Rail line on 3-31-2022. So far have downloaded the 311 data (not cleaned it yet) and looked at the request type count/relative frequency over the years: 2015-2022 (till March 27th). Also looked at the looked at the request type count for different APCs.

Since then have been out of town up until today, so literally have no further update for the last week.

Plan for the upcoming week:

Availability: 6 hours this week.

ETA- Totally new to geospatial data analysis, so may be 1 to 2 weeks.

priyakalyan commented 2 years ago

Progress: Was successful in installing the docker but could not set up a local 311 data server (tried many times- the last step in Step 3: Build and seed your local database failed. Any suggestions/pointers? For now, I have stopped working on it. Downloaded data from this website.

Loaded the metro rail line shapefile, the metro bus line shapefile and the neighborhood council shapefile.

Currently working on spatially joining the 311 data and the NC data (looking at one region at a time- 12 in all). Then overlay the metro rail and bus line and plot different request type and do like a qualitative study exploring the request type count geographically.

Availability: 6 hours this week.

ETA- 1 to 2 weeks.

priyakalyan commented 2 years ago

Progress: Finally figured out to how to use paginated API's with python to fetch all rows of data from the 311 server for the year 2021. I have saved it as a CSV file-clean_311_data_2021. I will fetch the clean data rest of the years (2015-2020, 2022).

Have spatially joined the 311 data+ NC data + metro bus + metro rail line displaying the specific request types over 12 regions of NC.

Adding sample pics here- this is for the region 4- South East Valley- NC's: 'SHERMAN OAKS NC', 'NORTH HOLLYWOOD NORTH EAST NC', 'VAN NUYS NC', 'GREATER VALLEY GLEN', 'NOHO NC', 'NOHO WEST NC', 'STUDIO CITY NC', 'NC VALLEY VILLAGE', 'GREATER TOLUCA LAKE NC'.

Part1 Part2 Part3 Part4 Reg4

Availability: 6 hours this week.

priyakalyan commented 2 years ago

Progress:

Plan for the upcoming week:

Availability: 6 hours this week.

ETA- 1 week

nichhk commented 2 years ago

The team discussed this last Thursday, so I'll leave some notes for the record:

I think it would be useful to have a histogram where the x axis is "distance from nearest bus stop/metro rail marker/etc." and the y axis is "number of requests". This will allow us to very clearly see whether there is some correlation between nearness to bus stops and 311 requests.

priyakalyan commented 2 years ago

Used the haversine formula- (great-circle distance) to calculate the distance between each request type-lat, long and metro rail stop. For each request type, found out the distance from the nearest metro rail marker. All this was done for reg6 - year 2021 and request type- Single Streetlight Issue.

As discussed in the last 311 team meeting, here is the histogram plot:

Histogram_reg6_ssi_2021_1

nichhk commented 2 years ago

Thanks Anupriya! Sorry for the delay. What do you make of this graph? To me, it seems to suggest that there is not a strong association between distance to nearest metro stop and request frequency--I'd expect to see a (basically) monotonically decreasing histogram, implying that there are a lot of requests close to metro stops but just a few far from metro stops. But maybe a request type like graffiti would be more illuminating.

Another bit that might help us understand this better: what is the density of metro stops? If the density of metro stops is very low, e.g., they are 10km apart from each other, then the median distance from the nearest metro stop of ~500m would be quite close. But if metro stops are 1km apart from each other, then ~500m is pretty far.

With this foundation, I think we can start controlling for factors like population density, bus ridership density, and metro stop density. Does that sound feasible?

priyakalyan commented 2 years ago

Have been trying to figure out how to get the population of each neighborhood council so that we can figure out the population density and so on. As @piotrsan mentioned in another issue

I also found this: Demographics of Neighborhood Councils. In both these files there are only 97 records- 97 NCs.

The NC boundary has been updated in 2018 with 2 new NC's added- here is the link. I found out the missing council names- NORTH WESTWOOD NC and ARTS DISTRICT LITTLE TOKYO NC.

Next step is to figure out how to go from census block/tract data and adjust it at NC level. This link gives the mapping process to start from block data and reconcile at NC boundary level.

After today's meeting- it looks like starting at census tract will be the easiest way to go. Take the NC shape file and merge it with the census tract and get the geocodes and move on to demographics from there.

priyakalyan commented 1 year ago

Have calculated the population of each neighborhood council using the census tract 2020 (TIGER/line shapefile 2020), updated NC shape file (99 councils) and the ACS 2020 demographics data at the tract level. No approximation was made in the geometry this time. Found the percentage of area/population for tracts intersecting multiple NCs and then calculated the actual population.

priyakalyan commented 1 year ago

Worked on this notebook- to find the updated population of the LA city neighborhood councils using geospatial analysis. Next- add a notebook- comparing the updated NC population obtained by geospatial analysis and arcGIS analysis.

priyakalyan commented 1 year ago

Have updated the notebook. The total population of LA city NCs is very close to the 2021 Census Bureau value. Have also been working on this PR- API pagination using python- to fetch all rows of data from 311 data pipeline for a given year.

akhaleghi commented 1 year ago

Hi @priyakalyan, are there any recent updates to this issue?

priyakalyan commented 1 year ago
ExperimentsInHonesty commented 2 weeks ago

A summary of this should be added to the wiki