hackforla / 311-data

Empowering Neighborhood Associations to improve the analysis of their initiatives using 311 data
https://hackforla.github.io/311-data/
GNU General Public License v3.0
62 stars 63 forks source link

v2 Reports Dashboard (First Draft) #1197

Closed EchoProject closed 2 years ago

EchoProject commented 2 years ago

Overview

We need to create a first draft Reports Dashboard with summary statistics and basic analytics so that we can create an active version using Ploty Dash.

Additional note: Then we will show users how they can derive insights to create Service Request-based initiatives. Finally, we can get feedback on improving the dashboard feature set.

Action Items

Features

Resources/Instructions

nichhk commented 2 years ago

Thanks for working on this Josh!

I think it would be interesting to show the distribution of time-to-close for each request type. E.g., if the distribution is bimodal, this can mean that a certain request type may have two subtypes, where one is easier to fix than the other. It might also capture that issue Bonnie mentioned where certain issues are closed after a while without actually being resolved (?). It might also be informative to overlap distributions per NC, to see how quickly issues are resolved in different NCs.

joshuayhwu commented 2 years ago

Thanks for the suggetion Nich!

I think adding comparison would be extremely helpful - will put that as a feature to incorporate!

joshuayhwu commented 2 years ago

2) Data by Police Precinct plotly_mvp2

joshuayhwu commented 2 years ago

1) Incorporate distribution of time-to-close for each request type 2) Flag requests that have absurd time-to-close (Ignored Requests?) 3) Include page / tabs with summary and comparison between NC

joshuayhwu commented 2 years ago
  1. Incoporate text-based descriptions (i.e. context) for each statistic
  2. As pandas won't scale, use dask instead
  3. Use monthly data only to prototype how dashboard would look like
joshuayhwu commented 2 years ago

plotly_mvp3

plotly_mvp4

Todo: High Priority:

Low Priority

nichhk commented 2 years ago

The team took a look at Josh's updates on Thursday, here's what I remember discussing for the record:

joshuayhwu commented 2 years ago

Plotly MVP Dashboard v1.2

Visuals:

1) Line Chart: Total number of 311 Requests over the time range as defined by the earliest request create date and latest request create date. This shows which specific time range has the most request 2) Pie Chart: Share of request type based on the data available. This shows what kind of request has the highest/lowest demand in a particular neighborhood council 3) Histogram: Distribution of request time to close This shows how long it takes for each request to complete (proxied by close request) as a distribution

Features:

1) Selecting individual neighbhood council 2) Removing particular request types 3) Data Quality Toggle to filter data with quality issues (where the time to close is less than 1 day or longer than 100 days)

Visuals:

1) Indicator Visuals: Total number of requests and the number of days of the data available 2) Bar Chart: Number of requests by sources. This indicators show the variety of mediums individuals make request through 3) Line chart: total number of 311 request comparison

Changes from before: 1) Added exclusion filter that achieves the following functionalities:

nichhk commented 2 years ago

Thanks for these updates Josh!

Re: remove one or more request types: I think it might be more intuitive to make this the opposite, i.e., select one or more request types. This will better align with the map functionality as well.

Re: Data Quality Toggle: This looks super useful! May I ask how you chose the thresholds for "bad"? This might be a situation where we might have to combine some statistical analysis and also get input from City folks.

In terms of statistical analysis, I think there are several ways to detect outliers. One way is applying something like a z-score range.

But I think we also need help from the City to understand what acceptable timeToCloses are. It might be perfectly ok for a timeToClose to be like, 10min, for example, if it's a duplicate of another request.

joshuayhwu commented 2 years ago

Thanks for the feedback Nich!

EDIT: fixed this bug - it was sloppy logic on my part. But another minimum series length occurred EDIT2: Figure out what was happening. Essentially my filtering logic tries to filter some rows by selection, for in some cases, the filtering mechanism removes all rows from a dataset, causing error to show as I didn't specify what should happen to the visuals when there is no data. EDIT3: Raise PreventUpdate() exception

Raw Distribution raw_timeToClose

After removing outliers based on rule above filter_timeToClose

joshuayhwu commented 2 years ago

@ExperimentsInHonesty thanks for clarifying the context for the dashboards MVP last week. Would love some feedback from you on this version. Please note the following:

NC Summary plotly_mvp_v1 3_1

Visuals:

1) Line Chart: Total number of 311 Requests over the time range as defined by the earliest request create date and latest request create date. This shows which specific time range has the most request 2) Pie Chart: Share of request type based on the data available. This shows what kind of request has the highest/lowest demand in a particular neighborhood council 3) Histogram: Distribution of request time to close This shows how long it takes for each request to complete (proxied by close request) as a distribution

Features:

1) Selecting individual neighbhood council 2) Selecting one or more request types 3) Data Quality Toggle to filter data with quality issues (where the time to close is not a outlier)

NC Comparison plotly_mvp_v1 3_2

Visuals:

1) Indicator Visuals: Total number of requests and the number of days of the data available 2) Bar Chart: Number of requests by sources. This indicators show the variety of mediums individuals make request through 3) Line chart: total number of 311 request comparison

Features:

1) Compare the total number of requests and date range between NCs 2) Compare how individuals make 311 request between the two NCs 3) Compare the number of requests throughout the day for both NCs

joshuayhwu commented 2 years ago

See my public repo for integrated dashboard file and instructions that could be run locally. I only used new version of dash and docker.

Currently looking to integrate the code into 311 Data Code Base. There are some issues with the newer versions of Dash / gunicorn interface that none of the callback functions work with the newer version of dash / gunicorn / docker interaction. i.e. no response with any interaction on dashboards. Will try to figure this out in the next few days.

nichhk commented 2 years ago

Thanks Josh! In your repo, can you put in the unzipped files instead of the zip so that people can browse the code without downloading? Let me know if you need help with debugging the interaction issue.

joshuayhwu commented 2 years ago

Thanks, I put in the unzipped files in the public repo. Would appreciate some help whenever you're available, but I'll continue working on it and see if I could replicate error.

EDIT: Turns out the problem resolves just by adding flask. Seems like Gunicorn does not go well with Dash

joshuayhwu commented 2 years ago

I have summarize some of the feedbacks I received for the current version of the dashboard:

1) Regarding data quality issue, Bonnie had a wonderful insight for implementing one possible decision rule. When the requestSource is driver self report, and the time-to-Close of such request is 0, then it is likely the driver simply close the request instantly, then proceed to work on the request (or not). This could be one potential decision rule that we implement

2) The current color of the Plotly Dash dashboards are unfavorable for neighborhood council in formal publications (e.g. newsletter). Will use the default Dash colors for the plotly dashboards from now on

3) The current plotly dashboards doesn't take into account how our end user will utilize the dashboard, i.e. downloading the individual visualization and printing the dashboard page as a whole. Ideally, each visualization should have title, corespnding axes label, and correct scale. Each dashboard should also be optimized to the "printed" layout.

4) Will confirm the following again in next meeting: consistent with Nich's comments on combining dashboard, I propose to combine the recent dashboards with the overall dashboards. More specifically, the neighborhod dashboard could be combined with neighborhood_recent, overview dashboard and recent could be deprecated due to redundancy with current prototype, types_map could be combine with other dashboard (perhaps the one Piero is working on?), and this current prototype will be the final one. Meaning there would be 3 dashboards in total: neighborhood, current prototype, and types_map.

joshuayhwu commented 2 years ago

Updated Overview Dashboard Pt 1

PlotlyCombined1

Updated Overview Dashboard Pt 2

PlotlyCombined2