EchoProject commented 2 years ago

Overview

We need to create a first draft Reports Dashboard with summary statistics and basic analytics so that we can create an active version using Ploty Dash.

Additional note: Then we will show users how they can derive insights to create Service Request-based initiatives. Finally, we can get feedback on improving the dashboard feature set.

Action Items

[x] Draft a "wireframe" model using BI Tool
[x] Gather feedback from 311-Data Team
[x] Gather feedback from Data Science Community of practice
[ ] Gather feedback from Seymour Liao
[ ] Implement actual v2 Dashboard with 1 month Data for sample purposes
[ ] Beta-test dashboard with Neighborhood Council

Features

Filter by NC, Districts, Request Types
Summary statistics and visualizations by NC
Comparison statistics and visualiztaions by NC (e.g. request share, time distributions by NC, completed vs. closed requests counts...etc)
Provide background context to the all statistics, visualization and data involve.

Resources/Instructions

Dev Version Dashboard: https://dev.311-data.org/reports/dashboards/overview
1378

nichhk commented 2 years ago

Thanks for working on this Josh!

I think it would be interesting to show the distribution of time-to-close for each request type. E.g., if the distribution is bimodal, this can mean that a certain request type may have two subtypes, where one is easier to fix than the other. It might also capture that issue Bonnie mentioned where certain issues are closed after a while without actually being resolved (?). It might also be informative to overlap distributions per NC, to see how quickly issues are resolved in different NCs.

joshuayhwu commented 2 years ago

Thanks for the suggetion Nich!

I think adding comparison would be extremely helpful - will put that as a feature to incorporate!

joshuayhwu commented 2 years ago

"Wireframe" with Power BI to gather feedback 1) Neighborhood Council Summary
Filter by name, date-range, and request type
Indicator visuals for total number of requests, average time-to-close requests (days), maximum time-to-close requests (days)
Pie chart for share of request type
Distribution of request throughout the day by hour

2) Data by Police Precinct plotly_mvp2

Filter by police precinct and request create date
Indicator visual for total number of requests, average time-to-close requests (days)
Bar chart for request sources
Request type frequency

joshuayhwu commented 2 years ago

Feedback from 311-Data Team

1) Incorporate distribution of time-to-close for each request type 2) Flag requests that have absurd time-to-close (Ignored Requests?) 3) Include page / tabs with summary and comparison between NC

joshuayhwu commented 2 years ago

Feedback from DS Community of Practice

Incoporate text-based descriptions (i.e. context) for each statistic
As pandas won't scale, use dask instead
Use monthly data only to prototype how dashboard would look like

joshuayhwu commented 2 years ago

Plotly MVP Dashboard v1.1

plotly_mvp3

Filter by NC
Time series of 311 Request overtime
Pie chart for share of request type
Histogram for distribution of request time-to-close

plotly_mvp4

Filter by NCs (Side-by-side comparison)
Indicator visual for total number of requests and ignored requests (requests completed < 1 day)
Bar chart for Number of requests by sources
Overlaid Time series for time-to-close (unfinished)

Todo: High Priority:

Add exclusion filter (exclude particular filter types)
Try to add filter that excludes data with data quality issues
Cleaning up the User interface of the plotly dashboard

Low Priority

Add Request Type filter
Adding more Divs to make sure visuals are not as stretched horizontally
Add external style sheet to style titles and other html components
Complete overlay time series for time-to-close, or work on alternative visual
Incorporate text / annotation of visual to provide context
Consult 311-Data team and Data Science CoP for additional features / comments, revise for new version

nichhk commented 2 years ago

The team took a look at Josh's updates on Thursday, here's what I remember discussing for the record:

being able to select arbitrary sets of request types, like you can on the site, would be useful (e.g., "bulky items" is almost half of the data, but "bulky items" is generally not a quality-of-life issue, so NC members might want to filter those out)
"Ignored Requests" might not be worth surfacing to users. It's hard to find an accurate name for this; "ignored" kind of suggests that the city just ignored these requests, and I can't really think of anything better. It might be better to just have a small question mark on the "Total number of requests" box where we can show this info to very curious users.
There are two dimensions in which we can analyze this data: 1) use it to understand quality-of-life issues in different NCs; 2) use it to identify issues in bookkeeping and data management by the city teams that are handling the requests. For 1, it's not particularly useful to see requests that have data quality issues (i.e., time-to-close is super short or super long). For 2, it is. So we can implement a toggle that filters out requests with data quality issues (it would default to "on").

joshuayhwu commented 2 years ago

Plotly MVP Dashboard v1.2

Summary Dashboard

Visuals:

1) Line Chart: Total number of 311 Requests over the time range as defined by the earliest request create date and latest request create date. This shows which specific time range has the most request 2) Pie Chart: Share of request type based on the data available. This shows what kind of request has the highest/lowest demand in a particular neighborhood council 3) Histogram: Distribution of request time to close This shows how long it takes for each request to complete (proxied by close request) as a distribution

Features:

1) Selecting individual neighbhood council 2) Removing particular request types 3) Data Quality Toggle to filter data with quality issues (where the time to close is less than 1 day or longer than 100 days)

Comparison Dashboard

Visuals:

1) Indicator Visuals: Total number of requests and the number of days of the data available 2) Bar Chart: Number of requests by sources. This indicators show the variety of mediums individuals make request through 3) Line chart: total number of 311 request comparison

Changes from before: 1) Added exclusion filter that achieves the following functionalities:

Remove one or more request types for the summary dashboard
Exclusion filter request type options dependent on the NC selected, otherwise, assume all request types in entire dataset
Exclusion filter update will "freeze" with the last remaining request type - i.e. dashboard prevents user from removing all request type as display on dashboards 2) UI
Chose 'Open Sans' as default font to stay consistent with plotly visuals
Adjust font size to accomodate text to div ratio
Added spaces between different dvs 3) Comparison plots
Added overlapping line charts for the number of requests throughout the day 4) Data Quality Toggle
Added data quality toggle to filter out data that are considered "bad" (request timeToClose less than 1 day or longer than 100 days)

nichhk commented 2 years ago

Thanks for these updates Josh!

Re: remove one or more request types: I think it might be more intuitive to make this the opposite, i.e., select one or more request types. This will better align with the map functionality as well.

Re: Data Quality Toggle: This looks super useful! May I ask how you chose the thresholds for "bad"? This might be a situation where we might have to combine some statistical analysis and also get input from City folks.

In terms of statistical analysis, I think there are several ways to detect outliers. One way is applying something like a z-score range.

But I think we also need help from the City to understand what acceptable timeToCloses are. It might be perfectly ok for a timeToClose to be like, 10min, for example, if it's a duplicate of another request.

joshuayhwu commented 2 years ago

Thanks for the feedback Nich!

I have implemented the selection by requests type functionality, but unfortunately I discover another bug. In order to implement the dependent drop-down (the type dropdown only shows the types available in a particualr NC), he visuals wouldn't update when only the NC drop down is selected. Surprisingly, visuals are updated when only type dropdown is selected. This part is still under investigation.

EDIT: fixed this bug - it was sloppy logic on my part. But another minimum series length occurred EDIT2: Figure out what was happening. Essentially my filtering logic tries to filter some rows by selection, for in some cases, the filtering mechanism removes all rows from a dataset, causing error to show as I didn't specify what should happen to the visuals when there is no data. EDIT3: Raise PreventUpdate() exception

In terms of data quality, I essentially eye balled the value based on the visualization. I have now defined outliers by first using a log-transform and taking the median +- 1.5*IQR since the data is skewed (before and after log transform).

Raw Distribution raw_timeToClose

After removing outliers based on rule above filter_timeToClose

I agree we need to talk to City if possible. One thing I notice is that there are some rows with missing createDate / closeDate, causing timeToClose to be empty (which I replace with 0). There are some rows that have negative timeToClose, which are definitely data quality issues we need to investigate and constraint upstream.

joshuayhwu commented 2 years ago

@ExperimentsInHonesty thanks for clarifying the context for the dashboards MVP last week. Would love some feedback from you on this version. Please note the following:

Plotly dash doesn't support pre-defined groups in the dropdown lists (i,.e. cannot select 1 region, but must select individual NCs in the region). It is possible to select multiple NC at the same time, but for now I'm keeping things simple.
Descriptions on the visualization will be a later feature. Visualizations are designed to be as simple as possible and I don't want to assume data illteracy
You mentioned about some requests being close ridiculously early (i.e. less than 10 minutes or a day). Nich and I discussed this issue and we thought it is best to treat it as data quality issue rather than instantly flag as problematic - we need to talk to the people generating this data prior to making a conclusion.

NC Summary plotly_mvp_v1 3_1

Visuals:

1) Line Chart: Total number of 311 Requests over the time range as defined by the earliest request create date and latest request create date. This shows which specific time range has the most request 2) Pie Chart: Share of request type based on the data available. This shows what kind of request has the highest/lowest demand in a particular neighborhood council 3) Histogram: Distribution of request time to close This shows how long it takes for each request to complete (proxied by close request) as a distribution

Features:

1) Selecting individual neighbhood council 2) Selecting one or more request types 3) Data Quality Toggle to filter data with quality issues (where the time to close is not a outlier)

NC Comparison plotly_mvp_v1 3_2

Visuals:

1) Indicator Visuals: Total number of requests and the number of days of the data available 2) Bar Chart: Number of requests by sources. This indicators show the variety of mediums individuals make request through 3) Line chart: total number of 311 request comparison

Features:

1) Compare the total number of requests and date range between NCs 2) Compare how individuals make 311 request between the two NCs 3) Compare the number of requests throughout the day for both NCs

joshuayhwu commented 2 years ago

See my public repo for integrated dashboard file and instructions that could be run locally. I only used new version of dash and docker.

Currently looking to integrate the code into 311 Data Code Base. There are some issues with the newer versions of Dash / gunicorn interface that none of the callback functions work with the newer version of dash / gunicorn / docker interaction. i.e. no response with any interaction on dashboards. Will try to figure this out in the next few days.

nichhk commented 2 years ago

Thanks Josh! In your repo, can you put in the unzipped files instead of the zip so that people can browse the code without downloading? Let me know if you need help with debugging the interaction issue.

joshuayhwu commented 2 years ago

Thanks, I put in the unzipped files in the public repo. Would appreciate some help whenever you're available, but I'll continue working on it and see if I could replicate error.

EDIT: Turns out the problem resolves just by adding flask. Seems like Gunicorn does not go well with Dash

joshuayhwu commented 2 years ago

I have summarize some of the feedbacks I received for the current version of the dashboard:

1) Regarding data quality issue, Bonnie had a wonderful insight for implementing one possible decision rule. When the requestSource is driver self report, and the time-to-Close of such request is 0, then it is likely the driver simply close the request instantly, then proceed to work on the request (or not). This could be one potential decision rule that we implement

2) The current color of the Plotly Dash dashboards are unfavorable for neighborhood council in formal publications (e.g. newsletter). Will use the default Dash colors for the plotly dashboards from now on

3) The current plotly dashboards doesn't take into account how our end user will utilize the dashboard, i.e. downloading the individual visualization and printing the dashboard page as a whole. Ideally, each visualization should have title, corespnding axes label, and correct scale. Each dashboard should also be optimized to the "printed" layout.

4) Will confirm the following again in next meeting: consistent with Nich's comments on combining dashboard, I propose to combine the recent dashboards with the overall dashboards. More specifically, the neighborhod dashboard could be combined with neighborhood_recent, overview dashboard and recent could be deprecated due to redundancy with current prototype, types_map could be combine with other dashboard (perhaps the one Piero is working on?), and this current prototype will be the final one. Meaning there would be 3 dashboards in total: neighborhood, current prototype, and types_map.

joshuayhwu commented 2 years ago

Updated Overview Dashboard Pt 1

PlotlyCombined1

Updated Overview Dashboard Pt 2

PlotlyCombined2

hackforla / 311-data

v2 Reports Dashboard (First Draft) #1197

Overview

Action Items

Features

Resources/Instructions

1378

Updated Overview Dashboard Pt 1

Updated Overview Dashboard Pt 2