ibecav / CGPfunctions

Powell Miscellaneous Functions for Teaching and Learning Statistics
Other
27 stars 11 forks source link

Collision of numerical labels on slope graphs #39

Closed AndrewMarritt closed 4 years ago

AndrewMarritt commented 4 years ago

First I'd like to thank-you for such a fantastic implementation of slope graphs.

This is less a bug and more a refinement.

On the example attached we're getting numerical labels colliding on each axis. Given the distribution of our data this happens on most plots.

You've obviously been able to avoid the category labels colliding. Is there something that can be done on these numerical labels?

slopegraphGroup.pdf

ibecav commented 4 years ago

Hi,

Glad you find it useful. No magic bullet for this. Don't know if you've looked at the relevant vignette where I write this:

Finally, let me take a moment about crowding and labeling. I’ve made every effort to try and deconflict the labels on the left and right axis (in this example the Country) and that should work automatically as you resize your plot dimensions. pro tip - if you use RStudio you can press the zoom icon and then use the rescaling of the window to see best choices .

But the numbers (GDP) are a different matter and there’s no easy way to ensure separation in a case like this data. There’s a decent total spread from 57.4 to 20.7 and some really close measurements like France, Belgium, and Germany on the right side. My suggestion is in a case like this one you create a new column in your dataframe with two significant places. So specifically it would be newgdp$rGDP <- signif(newgdp$GDP, 2). In my testing, at least, I’ve found this helps without creating inaccuracy and not causing you to try and “stretch” vertically to disambiguate the numbers. This time I’ll also use LineColor to highlight how Canada, Finland and Belgium fare from 1970 to 1979.

So after a quick look at the pdf file you enclosed. You could:

  1. Change the overall dimensions of the plot to make it taller and less wide (that's what've done in rmd file for the vignette.
  2. You can try my rounding trick as described in the vignette
  3. Go to the top 10 or 15 themes or make it two pages (the bottom ones will spread out as you remove the top.
  4. I recently added a new Data.label | an optional column inside the dataframe that will be used as the label for the data points plotted. Can be complex strings and haveNAvalues but must be of classchr. By defaultMeasurementis converted tochrand used. so for example for all those items under 4.0% you could create a custom label that was "<4%"

Some easy examples of how to use it

newcancer$datalabel <- paste0(newcancer$Survival, "%")
newggslopegraph(newcancer, Year, Survival, Type, Data.label = datalabel)

newcancer <-
   newcancer %>%
   mutate(datalabel = case_when(Survival <= 4 ~ "< 4.0%",
                                TRUE ~ paste0(Survival, "%")))
newggslopegraph(newcancer, Year, Survival, Type, Data.label = datalabel)