owid / owid-grapher

A platform for creating interactive data visualizations
https://ourworldindata.org
MIT License
1.36k stars 230 forks source link

Missing data strategy does not hide entities with missing data #2867

Open pabloarosado opened 10 months ago

pabloarosado commented 10 months ago

Description

The option to "hide entities with missing data" does not hide all points with missing data.

Expected behaviour

The option to "hide entities with missing data" seems to hide missing data on the edges of the chart (at least that's what I think is happening). But if the missing data is in the middle (surrounded by non-missing data), nothing is hidden. The expected behaviour would be to hide all points where any indicator has missing data.

It's unclear exactly how those middle points should be hidden in a visually acceptable way, but the simplest would be to treat them as if there was no data at all on those points for any indicator.

Steps to reproduce

Steps to reproduce the behavior:

  1. Go to this URL https://ourworldindata.org/grapher/energy-consumption-by-source-and-country?country=~BLR
  2. See that, even though nuclear energy is missing between 2000 and 2019, and even if you set the missing data strategy to "hide", we still see data for those years.
sophiamersmann commented 10 months ago

As far as I know, "Hide missing data" currently means: Hide this entity if any of the indicators have no data at all for that entity

pabloarosado commented 10 months ago

As far as I know, "Hide missing data" currently means: Hide this entity if any of the indicators have no data at all for that entity

You are right: If any indicator has no data at all for that entity (country), no data is visualized. You can check this by taking this chart, choosing Algeria, and selecting to "hide". Given that there's no data for biofuels for Algeria, you see no data at all. I don't know if this behaviour is very useful (as soon as you had just one point for each indicator, then "hide" would do absolutely nothing).

So, what you said is true. However, that's not the only thing that the "hide" strategy does.

Take this chart and choose Algeria. There's fossil fuel data only from 2000 to 2021. If you select to "show", you see all years (as if fossil fuels were zero, e.g. in 2022). But if you choose to "hide", then you only see data from 2000 to 2021. To me, this behaviour means "hide data points if any indicator is missing". And I actually like this behaviour, and I think it's necessary.

Now, the problem with this behaviour is that, if any indicator has missing data in intermediate years, it seems like "hide" does nothing. So "hide" only hides points on the edges of the chart. At least that was my interpretation.

sophiamersmann commented 10 months ago

you're right, looking at the code, "hide data points if any indicator is missing" is a better description. Data points in the middle not being dropped seems to be another side effect of linear interpolation since missing data transforms run after interpolation. Missing data on the edges can't be extrapolated, so they still hold missing data at that point and are dropped; missing data in the middle have been filled with values, so they stay.

danyx23 commented 7 months ago

We're worring a bit about the comlexity of this one. Do you have a sense of whether this comes up in other data as well or is this specific to the energy data?