Closed nichhk closed 1 year ago
At least at the NC level, we have visualization on the total number of requests over the years. See the bottom on the dashboard here. I can take a stab on using some clustering algorithm to further identify smaller regions.
Thanks Josh! Yes, ideally, I think we'd want to get as granular as address-level, and then one notch above that, block-level. I think an individual NC would like to see if, e.g., 50% of their NC's 311 requests are coming from a single address.
Power BI Demo:
Next Steps:
Apparently we have an API endpoint that can produce "hotspots", see #1034. I'm not sure if this is helpful, or changes how we do things, but it's worth looking into.
The API use the clustering algorithm to identify hotspots. It is definitely useful if we want to implement it as a future feature. However it's not that useful for analysis purposes.
Wrote a quick function that basically round the longitude latitude pair by 2 decimal places and count the number of request in a neighborhood council. We can use this function for the 311 requests available every year since 2016. I can conduct some basic metrics like Year over Year comparison / quarter over quarters for the number of requests, but I 'll focus on bulky items, homeless encampments, and graffiti.
See function below:
def generate_hotspot_dataframe(df):
"""Generates the hotspots of each NC by the number of 311 requests.
This function takes in a raw LA 311 requests dataframe and aggregate by
the longitude and latitude of 311 requests in 2 decimal places for a
neighborhood council.
Args:
df: raw LA 311 requests for any year.
Return:
An aggregate 311 request dataframe that contains the count of 311 requests
per long/lat pair in each neighborhood council.
"""
print("* Rounding requests Long/Lat to 2 Decimal Places")
df['lat_2dp'] = df['Latitude'].round(decimals=2)
df['long_2dp'] = df['Longitude'].round(decimals=2)
print("* Aggregating dataframes")
final_df = df.groupby(['NCName', 'lat_2dp', 'long_2dp'], as_index=False)['SRNumber'].count().sort_values(['NCName', 'SRNumber']).reset_index()
return final_df
I'm not sure if two decimal places is small enough--1 degree of latitude/longitude is 69 miles, so two decimal places would be 0.69 miles, which is quite considerable. We can fine tune the number of decimal places as necessary.
Those target request types look good to me! I would also add illegal dumping and animal remains. Both are issues that might be concentrated in certain areas, and could be addressed with additional signage.
Thanks for the review!
def generate_hotspot_dataframe(df, dp, req_type):
"""Generates the hotspots of each NC by the number of 311 requests.
This function takes in a raw LA 311 requests dataframe, filter by "req_type" request type,
and aggregate by the longitude and latitude of 311 requests to 'dp' number of decimal places
for a neighborhood council.
Args:
df: a pandas dataframe with raw LA 311 requests for any year.
dp: an integer for the number of decimal places to round the lat/long to.
req_type: a string column name for the request type to filter the dataframe by.
Return:
An aggregate 311 request dataframe that contains the count of 311 requests
per long/lat pair in each neighborhood council.
"""
print("* Filtering dataframe by " + req_type)
df = df[df['RequestType'] == req_type]
print("* Rounding requests Long/Lat to " + str(dp) + " Decimal Places")
df['lat_2dp'] = df['Latitude'].round(decimals=dp)
df['long_2dp'] = df['Longitude'].round(decimals=dp)
print("* Aggregating dataframes")
final_df = df.groupby(['NCName', 'lat_2dp', 'long_2dp'], as_index=False)['SRNumber'].count().sort_values(['NCName', 'SRNumber']).reset_index()
return final_df
req_type_lst = ['Graffiti Removal', 'Bulky Items', 'Homeless Encampment', 'Dead Animal Removal', 'Illegal Dumping Pickup']
for r in req_type_lst:
final_df = generate_hotspot_dataframe(df, 2, r)
final_df.to_csv("311_2020_Hotspot_" + r + ".csv")
Really rough function that generates corresponding dataframe for each request types. Still using 2 decimal points right now, but could be fine tuned now. Next step is to figure out a way to present this, or just send the list as is.
Hey Josh and Nich. I started digging in a little to familiarize myself with the 311 data around locations and request type. I'll bring questions I have from this initial exploration to the project meeting. I think you could do some clustering on past data to maybe predict types of requests in the different granular areas to help allocate resources but need to figure out how to do API calls to collect enough historical data and also create new features for granular location. The API call I used only gives up to 1000 records which was a question I was going to bring to the project call.
Here's where I'm storing all my code. https://github.com/ajmachado42/Hack-for-LA-311-Data
Hey Dri, thanks for taking a look at this! To get all the requests for a certain date range, you can use this tool. Feel free to reach out to @priyakalyan if you have any questions about using it.
Re: the clustering: not sure if you saw this already, but we already have one implementation that does this. Please take a look and see if it looks useful to you.
Btw, if you're blocked on anything, feel free to reach out to us on Slack or write out your questions here on GitHub. It can be a pain to write them out, but we want to help our teammates to be productive throughout the week!
Thanks Nich! I'll definitely use this API code and take a look at the clustering!
I made some pretty decent headway on the EDA and identifying hot spots by neighborhood council and address in this notebook.
I'm still figuring out breaking LA into small hot spot chunks and then mapping out the data points there but I started going down a rabbit hole about geopandas so the research is taking a little longer than I thought it would.
Some points for tomorrow's meeting (09/28/22):
@ajmachado42 Thanks so much for the comprehensive update! The notebook is very clear and comprehensive.
Once again, thanks so much for your hard work - Let me know what you think!
@joshuayhwu Thank you, Josh! I'll work on this this week.
Anupriya shared some Census resources for mapping files that breaks LA into the official city blocks and I think her and Nich fixed the API bug after the meeting. I'm going to be visiting family in Florida this week but will have time to update my notebook with the full year data set and start doing some geospatial analysis as well.
Geospatial Analysis
Clustering
https://github.com/ajmachado42/Hack-for-LA-311-Data/tree/master/I-1279
@ajmachado42 Thanks so much for the comprehensive updates - really appreciate the documentation on the notebooks!
Geospatial Analysis:
Clustering:
@joshuayhwu I updated the visualization notebook so it's broken up more. Github still won't render the folium maps though.
This is my Drive link for it which has all the datasets, etc. Let me know if that works! (I was able to create a layered map by type in the nc_only notebook.) https://drive.google.com/drive/folders/1njMKXLcs6CSgcZ_Gs9Fwxr6Iq2Wro45m?usp=sharing
Noted about clustering. Once I finish getting the maps and block data set to a good spot then I'll shift to focusing on the cluster analysis more.
@ajmachado42 thanks for breakit up! Notebook looks good and I really appreciate the comments!
I can take a look at the app and see how to render it if that's your only blocker. Otherwise, happy to check in on other blockers. Let me know which area you want most help with. Thanks for your hard work this week!
Hey @ajmachado42 and @joshuayhwu, Do you have an update for us on this issue?
Please update:
Thanks!
Hey @mc759
Progress:
Blockers:
Availability:
ETA:
-Adriana (sent from mobile)
On Mon, Dec 12, 2022, 7:25 PM mc759 @.***> wrote:
Hey @ajmachado42 https://github.com/ajmachado42 and @joshuayhwu https://github.com/joshuayhwu, Do you have an update for us on this issue?
Please update:
- Progress:
- Blockers:
- Availability:
- ETA:
Thanks!
— Reply to this email directly, view it on GitHub https://github.com/hackforla/311-data/issues/1279#issuecomment-1347693457, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARSELD3RR3XT7EPIQP4VILDWM7UC5ANCNFSM53CHQQ2A . You are receiving this because you were mentioned.Message ID: @.***>
Moving this one to closed after discussed with Josh. Lots of templates for analyses (statistical and geospatial) and mini program to generate a report that adds census block IDs to each request based on the address of the request. Feel free to reach out to me if you need anything!
Overview
This can be very useful information for NCs and city agencies. Basically, we can identify addresses or small areas that could benefit from more signage, increased community assistance, or other actions.
This was actually one of the original goals of 311 Data (see Use Case Feasibility Report).
[Update 12/05/22] In progress HERE:
Action Items