pverspeelt / shifterator

Interpretable data visualizations for understanding how texts differ at the word level
https://pverspeelt.github.io/shifterator/
Other
2 stars 0 forks source link

ggplot2 error when plotting entropy shift #1

Closed jstonge closed 2 years ago

jstonge commented 2 years ago

Hi! Congrats on the package. I am writing a shiny app using wordshifts, and you saved me a bunch of time with your shifterator package.

I stumbled on a small bug while testing the package with my data, e.g.
1920-boys.csv 1930-boys.csv

type2freq1 <- read.csv("1920-boys.csv")
type2freq2 <- read.csv("1930-boys.csv")
wordshift <- shifterator::entropy_shift(type2freq1, type2freq2)
shifterator::get_shift_graphs(word
#> Warning: Removed 2 rows containing missing values (position_stack).
#> Error: Aesthetics must be either length 1 or the same as the data (1): fill

I think this caused by the following lines (75-79) in R/total_contributions_plot.R:

...
  ggplot2::ggplot(plotting_data, 
                  ggplot2::aes(x = x_label , y = value)) +
    ggplot2::geom_bar(stat = "identity", fill = colour) +
    ggplot2::geom_text(ggplot2::aes(label = labels), size = 3, hjust = c("outward")) + 
    ggplot2::scale_y_continuous(label = scales::percent, limits = y_limits) +  
...

where fill=colour is somehow wrong because, I guess, colour does not have the same length than plotting_data in some cases?

If you are willing, I can make a branch and investigate this further. But I wanted to open an issues to let you know.

p.s. I was also wondering if you would be interested in adding the possibility to make secondary plots (cum_contribution_plot and text_size_plot) inset instead of keeping them on the side of the main plot (as per Ryan Gallagher package)? With patchwork::inset_element this is easy to do, but perhaps you had a reason to not have them as insets.

pverspeelt commented 2 years ago

Hi,

Thanks for the bug notification. I will have a look at it.

As for using insets, I think that when I was building all the elements, I was thinking of an option to turn on and of the extra graphs, but then decided against it. I'm also not a big fan of insets as they sometimes can mask some bar values. I saw some examples of that when I was testing the python package and seeing the results. But I could always built a switch that lets a user choose insets or secondary plots. I will have to think about it and see how to deal with the phi numbers.

pverspeelt commented 2 years ago

Found the bug. It is the y_limits (again). I'm need to rewrite the limits sections and hand over the y_limits selection to patchwork instead of using ggplot2. I can move the phi values to the title instead of a text grob and see if insets make life simpler or not.

But first the bug.

jstonge commented 2 years ago

Btw, I totally agree for the inset. I think in my case this is useful because I am trying to fit many plots on a single page for my shiny, e.g.

Screenshot from 2022-01-12 09-34-06

I do have a workaround in my code, so this is no problem if you prefer not include it.

pverspeelt commented 2 years ago

My test code creates something like below. This is a copy directly out of rstudio, so aspect ratios are not correct. The zoom function shows it a lot better.

image

It looks more like the original plots from the python package. I'll update the code on github tomorrow.