owid / owid-grapher

A platform for creating interactive data visualizations
https://ourworldindata.org
MIT License
1.35k stars 227 forks source link

Improve memory footprint/efficiency of grapher #3723

Open marcelgerber opened 1 week ago

marcelgerber commented 1 week ago

Core problem

Currently, grapher consumes a lot of memory for large datasets - especially (daily) Covid data.

This is not great either way, but it also means that our CF Worker for thumbnail rendering will fail for most of these (try searching for covid nigeria, for example).

Proposed solution

A lot of this is happening because of table transforms, and because of the fact that table transforms always hold onto the parent table. It is definitely worth seeing how much this improves already if we don't hold onto the parent table any more.

Context

See https://docs.google.com/spreadsheets/d/1Wzz76EYyOT0vo6uqQVMvvAQh0tOJyxcyJuUFfowpzbA/edit#gid=2114577793 for November 2023 stats of memory used in the SVG tester (see #2915).

Reducing the footprint would help us get into a more dynamic, less statically rendered publication. It would also help on memory-constrained devices, e.g. phones. The COVID country profile consumes close to 3GB of heap space on my machine.

marcelgerber commented 1 week ago

Just updated the SVG Tester performance sheet here: Grapher performance metrics - Google Sheets Data from a staging site on foundation-1, run using node itsJustJavascript/devTools/svgTester/export-graphs.js --isolate. Took 90min to run.

toni-sharpe commented 6 days ago

Thinking from the outside

(these are my thoughts, delete if inappropriate)

Why covid? Is that because the data is recent, rich and daily? Whereas a lot of graphs are year by year and historical? My guess is that Covid is the first time you've hit a real-time global OWID style problem full-on, rather than an ongoing one and the first time that's happened in today's modern era of data everywhere.

Is there value in looking at the problem from the covid data angle as well as the general perf. problem angle? ie. can covid data be optimised?

And the daily ones, that means 5 years becomes 1826-7 selections with the slider; then factor in 2020+ when reporting is so much more solid than it was 200 years ago (every country now does it). 300-400k when most of the graphs here are dealing with 5k, 10k maybe. (note "daily" and "day" also show a lot of green at the top of the google doc).

Could a solution be to apply a scale that gets grainier the further the back you go, then allow expansion on user request?

ie. on load:

T-0   - T-90 days  : daily data
T-90  - T-365 days : weekly data
T-365 - T-730 days : mothly data
T-    ....         : yearly data
image

Speaking as a user

The graphs that load everything like this are not really usable, if it takes 5s to load on my machine, then I'm confronted by so many lines they make a single block of colour, and, if I move something, updating snags up too, then I'm not going to use this page much. Maybe the problem is as much "data in" as it is "data display technique"? IMO this would be more useful if I chose from blank, or was guided to a selection (by an article).

image