rempsyc / busara_dashboard

The Missing Majority in Behavioural Science Dashboard
https://remi-theriault.com/dashboards/missing_majority
1 stars 0 forks source link

Fix missing continents / group labels in waffle plots #39

Closed rempsyc closed 3 months ago

rempsyc commented 5 months ago

Patrick noted in #31,

There are a couple miscellaneous graphical issues that I noticed. [...] Africa missing on the Continent, by Journal plot. See below

Screenshot 2024-03-18 at 00 26 57

I answered,

The missing continent for waffle plots I believe is a known issue with waffle plots for categories with little data. That said, it would be worth a more torough investigation again to document this behaviour properly.

We could try to fix this issue here.

rempsyc commented 4 months ago

The reason for this bug is that waffle plots show each square as = 1%, but Africa here represents less than 1%, and that squares that represent less than 1% are not shown. It seems the post below suggest some fixes that could be explored to clean this:

https://stackoverflow.com/questions/64607021/waffle-plot-does-not-show-one-group

Also seems to be intentional behaviour for small values:

The purpose of a waffle plot is that it shows 1 square or icon per observation. Trying to split squares kind of defeats that purpose.

https://stackoverflow.com/questions/50880025/waffle-plot-with-decimal-numbers

However if you use a row number that the sum of your vals is not divisible by you will get some ‘overhang’ with filler tiles added in, to avoid this behavior specify the colors for your categories. [source]

Explanation. If Europe = 50.3, it will be represented by 50 squares, not more and Asia = 6.21, it will be represented by 6 squares, not more, but then we realize that we have an unacounted leftover of 0.30 + 0.21 = 0.51 continent that has no tile of its own. That is why in the end there is an empty tile, because some space it needed to represent the leftover fractions.

rempsyc commented 4 months ago

@psforscher since the issue is due to continents or countries having less than 1% representation, one possible fix is to replace any value smaller than 1% to exactly 1% so that it shows both on the plot and on the legend. One issue that would bring is that the total would not amount to 100% anymore (it would be slightly greater) and that it would overestimate representation, e.g., something like 0.0001% would still be interpreted as 1% by the public. What do you recommend then here? It's not possible to have 0.0001% of a square, but another option would be to add the continents manually on the legend but not on the plot (so there would still be empty squares and no squares representing Africa for example)

psforscher commented 4 months ago

i think i would favor adding countries manually on the legend and not the plot

rempsyc commented 3 months ago

Alas, the provided solutions are not working for our use case. I have opened a question on Stackoverflow and it received a response, but it is rather unsatisfactory. So I have submitted a feature request to the package developer on GitHub (hrbrmstr/waffle#99). I am thinking for now the easiest fix would be to have a rule whereas if a value is < 1, then I bump it to 1, but not otherwise. I think if we add the exact percentages (with a single decimal to avoid visual overwhelm) to the legend, that would be acceptable.