openoakland / woeip

A platform for impacted communities to understand their local air quality and advocate for environmental justice.
https://woaq.org
MIT License
29 stars 16 forks source link

As a user looking at a single collection session, I would like to see the appropriate amount of data points depending on the zoom level I've selected #385

Closed kwonangela7 closed 2 years ago

kwonangela7 commented 2 years ago

Description

The user would like to see more of the individual data points when they zoom into the map, and less of the individual data points when they zoom out.

Strategy

Shape of the data Sample data collection

Problem The amount of data points may be overwhelming for the user to look at. We are interested in only showing the pollutant reading with the highest value, within every n points. So, if there are 3000 data points, we would split them into buckets of 100 or so data points. The buckets are based on the timestamps when the readings were taken. The readings 0-99 go into the first bucket, 100-199 into the next, etc. But, the buckets would only have 100 points at the highest zoom level. At the next zoom level, there would be 50 readings in each bucket and we would only display the highest value from that bucket.

Background The data are collected via two citizen scientists walking a route around a West Oakland neighborhood with a GPS device and a pollutant reader. This means there is strong correlation between the time when the reading was taken, and the geographic "order" of the points. By organizing based on "time", we get a cheap and fast way to organized by the correlated "geo".

Optimization Need Finding the maximum data reading within each bucket is computationally expensive. Recalculating the data points on every zoom may cause disruptive loading behavior and undo stress on the browser. We can do better than this.

Hypothetically, the order in which the data points are added to the map after each zoom should be deterministic- we can calculate it once, store it in a data structure, and then perform O(1) lookups on that data structure. The data structure would preferably be an array (list), as the data layers are arrays. We need an algorithm to create this sorted array.

Sample data
API Swagger Docs

Caution: Computer may experience slowdown while loading
Example Collection of Data

Code Placement It will be simpler to implement this using the frontend, rather than the api 1) The API will not need to be updated 2) We will not need to update the frontend to make new, more complex calls, api calls 3) The computational effort is offloaded to the browsers, allow us to keep our server overhead low.

Acceptance Criteria

Related Issues

315 It's important to understand #315 (n is defined there). Based on the zoom level, the developer can pass in a different query parameter. For example, if the zoom level is 0%, then n = 1 (all data points would be displayed)

mnorelli commented 2 years ago

I wonder if heatmapping is the way to go here, to show the general character of clusters of points without having to show each and every point. It's pretty neat that in this Mapbox example, Create a heatmap layer that you can actually zoom all the way down until the constituent points of the heatmap representation can be seen. Maybe this leads to a larger discussion of how people actually want to experience the data. Is resolving a single point in any zoom level necessary? If not, at what zoom level would someone want to query an individual point?

theecrit commented 2 years ago

Let's think about the actual use case. For our current priority audience (volunteer data collectors), they are learning generally about the connection between emissions sources in the environment and the air pollution that results. So the representation of that particulate reading on the geopoint is kind of the bridging of that connection. I worry that a heat map might pull them out of the the learning mode of understanding those connections and refocus them more broadly on general readings.

While the heat map may be helpful for a general user, I wouldn't want to overlook the basics that need to happen for the volunteer learner.

Thoughts on this? I might be overthinking it.

This zoom thing has to be a solved problem already somewhere. 🤔

exchrotek commented 2 years ago

I am a big fan if heatmaps broadly speaking because of how they convey trends and info visually quite well. However, in the interest of having volunteers better understand the data they are collecting, I could see how a heat map could abstract some of that understanding away.

A specific scenario that comes to mind is if they walk by a smoker and the air quality readings suddenly get significantly worse in a very localized area. If you used a heat map without including some sort of histogram or a maximum PM 2.5 reading for a certain area, and you're zoomed out far enough, the volunteer might not have made that connection of how smoking can affect PM 2.5 in a very localized region. And part of this effort, as I understand it, is to really have people understand what are key contributors to poor air quality and how spatially widespread their effects are?

But it does seem like the potential problem I've surmised could ultimately be solved if the map resolves into individual points at a predetermined zoom level.

At the end of the day, it kind of sounds like a "how much averaging should our data visualization have?"

On Mon, Feb 14, 2022, 7:05 PM Jess Sand @.***> wrote:

Let's think about the actual case. For our current priority audience (volunteer data collectors), they are learning generally about the connection between emissions sources in the environment and the air pollution that results. So the representation of that particulate reading on the geopoint is kind of the bridging of that connection. I worry that a heat map might pull them out of the the learning mode of understanding those connections and refocus them more broadly on general readings.

While the heat map may be helpful for a general user, I wouldn't want to overlook the basics that need to happen for the volunteer learner.

Thoughts on this? I might be overthinking it.

This zoom thing has to be a solved problem already somewhere. 🤔

— Reply to this email directly, view it on GitHub https://github.com/openoakland/woeip/issues/385#issuecomment-1039802219, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSBYHVBNLOJAPJWUAOJHUDU3G7GBANCNFSM5LYFCFEA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

theecrit commented 2 years ago

I haven't read this thoroughly, but is there a solution here?

https://docs.mapbox.com/help/troubleshooting/working-with-large-geojson-data/

Again, we're not the first ones to tackle this challenge so it seems like it's a matter of seeing how else this has been done?

mnorelli commented 2 years ago

Additional resource

Mapbox point clustering

theecrit commented 2 years ago

Won't fix.