Open snooravi opened 2 years ago
Things we need:
Updated content
@ShikaZzz can you link any material that you have for this issue. We will review today
@snooravi The updated content is from my analysis of the dataset on my local Jupyter notebook. I can share my screen and organize the result into a Google colab notebook later if necessary
That would be great, thanks!
@ShikaZzz please always add your work to a Google colab or commit your code in a draft pr, in case you git hit by a bus. Its the data science equivalent of only working on documents that are stored in our shared drive instead of a persons mydrive.
Thanks!!!
may pause for now and change to how to use 311-data.org tools and the Alpha Report Tool to derive insights
Alpha Report Tool:
311-data.org:
stories: detect data anomalies
compare by NC:
Availability: 15 hours ETA: by next week
analytics concepts:
challenge:
@ShikaZzz Please add update using this template
Progress: "What is the current status of your project? What have you completed and what is left to do?" Blockers: "Difficulties or errors encountered." Availability: "How much time will you have this week to work on this issue?" ETA: "When do you expect this issue to be completed?" Pictures: "Add any pictures of the visual changes made to the site so far."
If you need help, be sure to either:
Progress:
Blockers:
Availability: 20 hours ETA: ~7-8 hours for "doing" item
Progress:
did:
To-do: creating stories (?), working on feedback from meetings; possibly creating context by adding new datasets, which are not easy to find
Blockers: how to define outliers: different features have different distributions of processing days
Availability: 10 hours
ETA: 1-2 weeks
@ShikaZzz Please add update using this template (even if you have a pull request)
Progress: "What is the current status of your project? What have you completed and what is left to do?" Blockers: "Difficulties or errors encountered." Availability: "How much time will you have this week to work on this issue?" ETA: "When do you expect this issue to be completed?" Pictures: "Add any pictures of the visual changes made to the site so far."
If you need help, be sure to either:
updated links for EDA and data cleaning to the collab notebooks in the access the data drive
Overview
Analyze MyLA311 Service Request Data 2020, a public dataset available for preview and download (1,491,773 records, 34 feature columns), to understand what kind of stories (insight) we could tell based on it
Action Items
possible next step analysis: focus on graffiti (NEW)
put the data into context & dig in to understand graffiti from "in-depth" (data literacy) data analysis: (need to find other datasets)
Analysis & Findings
ServiceDate
andClosedDate
have NA valuesServiceDate
andClosedDate
(year=3020)------- (Analysis below is based on requests (94.96%) with valid
CreatedDate
,UpdatedDate
,ServiceDate
andClosedDate
)-------CreatedDate
,UpdatedDate
,ServiceDate
andClosedDate
by hoursServiceDate
during 00:00 to 00:59CreatedDate
and each ofUpdatedDate
,ServiceDate
andClosedDate
ServiceData
on the same day asCreatedDay
in total requests of a typeServiceDate
later thanCreatedDate
RequestType
,RequestSource
andStatus
Stories
Make Suggestions to the database (based on merely analyzing if the values of certain features make sense):
RequestSource
: Driver Self Report, VoicemailRequestType
: Graffiti Removal, Multiple Streetlight Issue, Single Streetlight IssueStatus
: allPolicePrecinct
: CENTRAL, NEWTONAssignTo
:Owner
: BSL, OCBGraffiti Removal Processing Days:
------------------------------------UPDATE------------------------------------
Data cleaning
each record has 4 timestamp columns: CreatedDate, UpdatedDate, ServiceDate, ClosedDate
valid data
: all the timestamp columns are in 2020 and ServiceDate>CreatedDate and ClosedDate>CreatedDateNA data
: records with NA (missing value)NC
: some NC's are not in the LA NC's area but still in the data and 1 NC is misclassified: remove NC's not in the LA NC's area and correct the misclassified oneAnalytical concepts:
Google Colab Notebook
data cleaning EDA