hackforla / 311-data

Empowering Neighborhood Associations to improve the analysis of their initiatives using 311 data
https://hackforla.github.io/311-data/
GNU General Public License v3.0
60 stars 61 forks source link

Identify addresses with significantly more 311 requests #1279

Closed nichhk closed 1 year ago

nichhk commented 2 years ago

Overview

This can be very useful information for NCs and city agencies. Basically, we can identify addresses or small areas that could benefit from more signage, increased community assistance, or other actions.

This was actually one of the original goals of 311 Data (see Use Case Feasibility Report).

[Update 12/05/22] In progress HERE:

Action Items

joshuayhwu commented 2 years ago

At least at the NC level, we have visualization on the total number of requests over the years. See the bottom on the dashboard here. I can take a stab on using some clustering algorithm to further identify smaller regions.

nichhk commented 2 years ago

Thanks Josh! Yes, ideally, I think we'd want to get as granular as address-level, and then one notch above that, block-level. I think an individual NC would like to see if, e.g., 50% of their NC's 311 requests are coming from a single address.

joshuayhwu commented 2 years ago

Power BI Demo: image

Next Steps:

nichhk commented 1 year ago

Apparently we have an API endpoint that can produce "hotspots", see #1034. I'm not sure if this is helpful, or changes how we do things, but it's worth looking into.

joshuayhwu commented 1 year ago

The API use the clustering algorithm to identify hotspots. It is definitely useful if we want to implement it as a future feature. However it's not that useful for analysis purposes.

Wrote a quick function that basically round the longitude latitude pair by 2 decimal places and count the number of request in a neighborhood council. We can use this function for the 311 requests available every year since 2016. I can conduct some basic metrics like Year over Year comparison / quarter over quarters for the number of requests, but I 'll focus on bulky items, homeless encampments, and graffiti.

See function below:

def generate_hotspot_dataframe(df):
    """Generates the hotspots of each NC by the number of 311 requests.

    This function takes in a raw LA 311 requests dataframe and aggregate by
    the longitude and latitude of 311 requests in 2 decimal places for a 
    neighborhood council.

    Args: 
        df: raw LA 311 requests for any year.

    Return:
        An aggregate 311 request dataframe that contains the count of 311 requests
        per long/lat pair in each neighborhood council.
    """
    print("* Rounding requests Long/Lat to 2 Decimal Places")
    df['lat_2dp'] = df['Latitude'].round(decimals=2)
    df['long_2dp'] = df['Longitude'].round(decimals=2)

    print("* Aggregating dataframes")
    final_df = df.groupby(['NCName', 'lat_2dp', 'long_2dp'], as_index=False)['SRNumber'].count().sort_values(['NCName', 'SRNumber']).reset_index()
    return final_df
nichhk commented 1 year ago

I'm not sure if two decimal places is small enough--1 degree of latitude/longitude is 69 miles, so two decimal places would be 0.69 miles, which is quite considerable. We can fine tune the number of decimal places as necessary.

Those target request types look good to me! I would also add illegal dumping and animal remains. Both are issues that might be concentrated in certain areas, and could be addressed with additional signage.

joshuayhwu commented 1 year ago

Thanks for the review!

def generate_hotspot_dataframe(df, dp, req_type):
    """Generates the hotspots of each NC by the number of 311 requests.

    This function takes in a raw LA 311 requests dataframe, filter by "req_type" request type,
    and aggregate by the longitude and latitude of 311 requests to 'dp' number of decimal places 
    for a neighborhood council.

    Args: 
        df: a pandas dataframe with raw LA 311 requests for any year.
        dp: an integer for the number of decimal places to round the lat/long to.
        req_type: a string column name for the request type to filter the dataframe by.

    Return:
        An aggregate 311 request dataframe that contains the count of 311 requests
        per long/lat pair in each neighborhood council.
    """
    print("* Filtering dataframe by " + req_type)
    df = df[df['RequestType'] == req_type]

    print("* Rounding requests Long/Lat to " + str(dp) + " Decimal Places")
    df['lat_2dp'] = df['Latitude'].round(decimals=dp)
    df['long_2dp'] = df['Longitude'].round(decimals=dp)

    print("* Aggregating dataframes")
    final_df = df.groupby(['NCName', 'lat_2dp', 'long_2dp'], as_index=False)['SRNumber'].count().sort_values(['NCName', 'SRNumber']).reset_index()
    return final_df

req_type_lst = ['Graffiti Removal', 'Bulky Items', 'Homeless Encampment', 'Dead Animal Removal', 'Illegal Dumping Pickup']
for r in req_type_lst:
    final_df = generate_hotspot_dataframe(df, 2, r)
    final_df.to_csv("311_2020_Hotspot_" + r + ".csv")

Really rough function that generates corresponding dataframe for each request types. Still using 2 decimal points right now, but could be fine tuned now. Next step is to figure out a way to present this, or just send the list as is.

ajmachado42 commented 1 year ago

Hey Josh and Nich. I started digging in a little to familiarize myself with the 311 data around locations and request type. I'll bring questions I have from this initial exploration to the project meeting. I think you could do some clustering on past data to maybe predict types of requests in the different granular areas to help allocate resources but need to figure out how to do API calls to collect enough historical data and also create new features for granular location. The API call I used only gives up to 1000 records which was a question I was going to bring to the project call.

Here's where I'm storing all my code. https://github.com/ajmachado42/Hack-for-LA-311-Data

nichhk commented 1 year ago

Hey Dri, thanks for taking a look at this! To get all the requests for a certain date range, you can use this tool. Feel free to reach out to @priyakalyan if you have any questions about using it.

Re: the clustering: not sure if you saw this already, but we already have one implementation that does this. Please take a look and see if it looks useful to you.

Btw, if you're blocked on anything, feel free to reach out to us on Slack or write out your questions here on GitHub. It can be a pain to write them out, but we want to help our teammates to be productive throughout the week!

ajmachado42 commented 1 year ago

Thanks Nich! I'll definitely use this API code and take a look at the clustering!

ajmachado42 commented 1 year ago

I made some pretty decent headway on the EDA and identifying hot spots by neighborhood council and address in this notebook.

I'm still figuring out breaking LA into small hot spot chunks and then mapping out the data points there but I started going down a rabbit hole about geopandas so the research is taking a little longer than I thought it would.

Some points for tomorrow's meeting (09/28/22):

  1. Size of each area to look at (each lat lon increment is about 69 miles, could break it into 100ths so .69 miles each)
  2. Should "hot spots" only include addresses that have multiple requests? A lot of requests are one offs for bulky item pickups. When you break it down, graffiti becomes the number one offender for repeat requests.
  3. API maxes out at 20000 requests -- 09/23/22-09/17/2022 were only date range able to be pulled
joshuayhwu commented 1 year ago

@ajmachado42 Thanks so much for the comprehensive update! The notebook is very clear and comprehensive.

  1. I like the idea of breaking them into 100ths. I initially did 2 decimal place of each lat/lon but I figured it would be not granular enough. It would be great to see the distribution of counts after you break them down into .69 miles each. If there are too many "hot spot blocks", we can take a larger block.
  2. As per our discussion during our meeting, I think the >=2 requests make sense. At the same time, I'd suggest checking the LA weekly/monthly/yearly nc request count average and treat that as the decision rule. Ultimately, we want something actionable and make an impact. If there are not that many requests we can't really do much about them as it's likely due to random chance / one-offs
  3. Hmm is that the case with the get_request_tool? I'd just use the 2021 LA 311 dataset and download as csv instead. I can take a look at the API

Once again, thanks so much for your hard work - Let me know what you think!

ajmachado42 commented 1 year ago

@joshuayhwu Thank you, Josh! I'll work on this this week.

Anupriya shared some Census resources for mapping files that breaks LA into the official city blocks and I think her and Nich fixed the API bug after the meeting. I'm going to be visiting family in Florida this week but will have time to update my notebook with the full year data set and start doing some geospatial analysis as well.

ajmachado42 commented 1 year ago

Geospatial Analysis

Clustering

https://github.com/ajmachado42/Hack-for-LA-311-Data/tree/master/I-1279

joshuayhwu commented 1 year ago

@ajmachado42 Thanks so much for the comprehensive updates - really appreciate the documentation on the notebooks!

Geospatial Analysis:

Clustering:

ajmachado42 commented 1 year ago

@joshuayhwu I updated the visualization notebook so it's broken up more. Github still won't render the folium maps though.

This is my Drive link for it which has all the datasets, etc. Let me know if that works! (I was able to create a layered map by type in the nc_only notebook.) https://drive.google.com/drive/folders/1njMKXLcs6CSgcZ_Gs9Fwxr6Iq2Wro45m?usp=sharing

Noted about clustering. Once I finish getting the maps and block data set to a good spot then I'll shift to focusing on the cluster analysis more.

joshuayhwu commented 1 year ago

@ajmachado42 thanks for breakit up! Notebook looks good and I really appreciate the comments!

I can take a look at the app and see how to render it if that's your only blocker. Otherwise, happy to check in on other blockers. Let me know which area you want most help with. Thanks for your hard work this week!

ajmachado42 commented 1 year ago
mc759 commented 1 year ago

Hey @ajmachado42 and @joshuayhwu, Do you have an update for us on this issue?

Please update:

Thanks!

ajmachado42 commented 1 year ago

Hey @mc759

Progress:

Blockers:

Availability:

ETA:

-Adriana (sent from mobile)

On Mon, Dec 12, 2022, 7:25 PM mc759 @.***> wrote:

Hey @ajmachado42 https://github.com/ajmachado42 and @joshuayhwu https://github.com/joshuayhwu, Do you have an update for us on this issue?

Please update:

  • Progress:
  • Blockers:
  • Availability:
  • ETA:

Thanks!

— Reply to this email directly, view it on GitHub https://github.com/hackforla/311-data/issues/1279#issuecomment-1347693457, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARSELD3RR3XT7EPIQP4VILDWM7UC5ANCNFSM53CHQQ2A . You are receiving this because you were mentioned.Message ID: @.***>

ajmachado42 commented 1 year ago

Moving this one to closed after discussed with Josh. Lots of templates for analyses (statistical and geospatial) and mini program to generate a report that adds census block IDs to each request based on the address of the request. Feel free to reach out to me if you need anything!