hackforla / access-the-data

The Access the Data project was created to address the growing gaps between new technology development and decisions impacting our communities with the underlying systems and data that drive those initiatives
GNU General Public License v2.0
18 stars 2 forks source link

Exploratory Analysis on MyLA311 Service Request Data 2020 #83

Open snooravi opened 2 years ago

snooravi commented 2 years ago

Overview

Analyze MyLA311 Service Request Data 2020, a public dataset available for preview and download (1,491,773 records, 34 feature columns), to understand what kind of stories (insight) we could tell based on it

Action Items

possible next step analysis: focus on graffiti (NEW)

put the data into context & dig in to understand graffiti from "in-depth" (data literacy) data analysis: (need to find other datasets)

Analysis & Findings

  1. 104 NC, 203 NC names, more than the actual 99 NC's
  2. Find that ServiceDate and ClosedDate have NA values
  3. Find one request that has a typo of ServiceDate and ClosedDate (year=3020)

------- (Analysis below is based on requests (94.96%) with valid CreatedDate, UpdatedDate, ServiceDate and ClosedDate )-------

  1. Compare distribution of CreatedDate, UpdatedDate, ServiceDate and ClosedDate by hours
    • 1,278,193 (90.23%) requests have ServiceDate during 00:00 to 00:59
  2. Compute new columns: duration between CreatedDate and each of UpdatedDate, ServiceDate and ClosedDate
  3. Find that there are requests with process hours <0
  4. Compare
    • ratios of requests with ServiceData on the same day as CreatedDay in total requests of a type
    • ratios of requests of a type in total requests
  5. Analysis on Graffiti Removal, the request type that has the most ServiceDate later than CreatedDate
    • by processing hours
    • by processing days
  6. Count of RequestType, RequestSource and Status

Stories

------------------------------------UPDATE------------------------------------

Data cleaning

each record has 4 timestamp columns: CreatedDate, UpdatedDate, ServiceDate, ClosedDate

Analytical concepts:

Google Colab Notebook

data cleaning EDA

snooravi commented 2 years ago

Things we need:

snooravi commented 2 years ago

https://github.com/hackforla/access-the-data/issues/89

ShikaZzz commented 2 years ago

Updated content

snooravi commented 2 years ago

@ShikaZzz can you link any material that you have for this issue. We will review today

ShikaZzz commented 2 years ago

@snooravi The updated content is from my analysis of the dataset on my local Jupyter notebook. I can share my screen and organize the result into a Google colab notebook later if necessary

snooravi commented 2 years ago

That would be great, thanks!

ExperimentsInHonesty commented 2 years ago

@ShikaZzz please always add your work to a Google colab or commit your code in a draft pr, in case you git hit by a bus. Its the data science equivalent of only working on documents that are stored in our shared drive instead of a persons mydrive.

Thanks!!!

ShikaZzz commented 2 years ago

may pause for now and change to how to use 311-data.org tools and the Alpha Report Tool to derive insights

Alpha Report Tool:

311-data.org:

stories: detect data anomalies

  1. compare by request type
  2. compare by NC:

    • e.g last month, Graffiti, North East Valley: 2 outstanding peaks in the frequency plot: 1 on Oct 29-30, 1 on Nov 20-21 removing Sylmar, data trend dramatically changes
    • other metrics: way of contact
    • con:
      1. need to manually select NC to filter out the one that affects the data trend
      2. cannot compare different request types between different NC's unless by con 1

Availability: 15 hours ETA: by next week

ShikaZzz commented 2 years ago

analytics concepts:

challenge:

lrchang2 commented 2 years ago

@ShikaZzz Please add update using this template

Progress: "What is the current status of your project? What have you completed and what is left to do?" Blockers: "Difficulties or errors encountered." Availability: "How much time will you have this week to work on this issue?" ETA: "When do you expect this issue to be completed?" Pictures: "Add any pictures of the visual changes made to the site so far."

If you need help, be sure to either:

  1. ask for help at your next meeting
  2. put a "Status: Help Wanted" label on your issue and pull request
  3. put up a request for assistance on the #access-the-data channel.
ShikaZzz commented 2 years ago

Progress:

Blockers:

Availability: 20 hours ETA: ~7-8 hours for "doing" item

ShikaZzz commented 2 years ago

Progress:

Blockers: how to define outliers: different features have different distributions of processing days

Availability: 10 hours

ETA: 1-2 weeks

lrchang2 commented 2 years ago

@ShikaZzz Please add update using this template (even if you have a pull request)

Progress: "What is the current status of your project? What have you completed and what is left to do?" Blockers: "Difficulties or errors encountered." Availability: "How much time will you have this week to work on this issue?" ETA: "When do you expect this issue to be completed?" Pictures: "Add any pictures of the visual changes made to the site so far."

If you need help, be sure to either:

  1. ask for help at your next meeting
  2. put a "Status: Help Wanted" label on your issue and pull request
  3. put up a request for assistance on the #access-the-data channel.
snooravi commented 2 years ago

updated links for EDA and data cleaning to the collab notebooks in the access the data drive