juliasilge / juliasilge.com

My blog, built with blogdown and Hugo :link:
https://juliasilge.com/
40 stars 27 forks source link

Reordering and facetting for ggplot2 | Julia Silge #10

Open utterances-bot opened 3 years ago

utterances-bot commented 3 years ago

Reordering and facetting for ggplot2 | Julia Silge

I recently wrote about the release of tidytext 0.2.1, and one of the most useful new features in this release is a couple of helper functions for making plots with ggplot2.

https://juliasilge.com/blog/reorder-within/

mehtapriyesh commented 3 years ago

Hello Julia, Thanks for the amazing article. It would be great if you could show how do we fill by name? For ex: I wish for all columns with label Michael to be green in color and a unique color assigned for every other name. Since all values are modified after applying reorder_within, every name within every facet has a different color when I do "fill = name" in aesthetics.

juliasilge commented 3 years ago

I think the best thing to do is to make a new variable that will be your fill variable, before you use reorder_within(). Here is an example with the Palmer penguins:

library(tidyverse)
library(tidytext)
data("penguins", package = 'palmerpenguins')

penguin_mass <- penguins %>%
  filter(!is.na(sex)) %>%
  group_by(species, sex) %>%
  summarise(body_mass_g = median(body_mass_g, na.rm = TRUE)) %>%
  ungroup()
#> `summarise()` has grouped output by 'species'. You can override using the `.groups` argument.

penguin_mass %>%
  mutate(species_fact = species,
         species = reorder_within(species, body_mass_g, sex)) %>%
  ggplot(aes(body_mass_g, species, fill = species_fact)) +
  geom_col(show.legend = FALSE) +
  scale_y_reordered() +
  facet_wrap(~sex, scales = "free_y")

Created on 2021-03-15 by the reprex package (v1.0.0)

mehtapriyesh commented 3 years ago

Thank You for such a quick response :)

Gazellenrehlein commented 3 years ago

Hello! I am big fan of your textbook Text Mining with R, and I'm looking for a solution for ordering a simple bar graph of most frequent terms in a corpus by n (frequency of word in the corpus). I tried the code from your book and looked up tips for reordering here and in other places, but I still can't order by n. Depending on where I put the reorder() comment, I either get a bar graph sorted in reverse alphabetical order or a bar graph that seems to be in no particular order at all.

Reverse alphabetical order

tidy_text %>%
  count(word, sort = TRUE) %>%
  filter(n > 200) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n)) +
  geom_col() +
  xlab(NULL) +
  coord_flip()

No visible order

tidy_text %>%
  count(word, sort = TRUE) %>%
  filter(n > 200) %>%
  ggplot(aes(reorder(word, n), n)) +
  geom_col() +
  xlab(NULL) +
  coord_flip()

If you had any thoughts on this, I'd really appreciate it. Thanks in any case!

juliasilge commented 3 years ago

@Gazellenrehlein That sounds pretty strange! Here are the kinds of results I get:

library(tidyverse)
library(tidytext)

tibble(txt = janeaustenr::prideprejudice) %>%
  unnest_tokens(word, txt) %>%
  anti_join(get_stopwords()) %>%
  count(word, sort = TRUE) %>%
  filter(n > 200) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(n, word)) +
  geom_col() +
  labs(y = NULL)
#> Joining, by = "word"

Created on 2021-05-20 by the reprex package (v2.0.0)

If you can create a reprex (a minimal reproducible example) to demonstrate your problem, that may help figure out what is going on. Once you have a reprex showing your problem, I'd recommend posting on RStudio Community. If you've never heard of a reprex before, you may want to start with the tidyverse.org help page. Good luck! 🙌

Gazellenrehlein commented 3 years ago

Thank you very much for your quick response! It seems like the problem with my code isn't immediately obvious (too bad). I'll do as you suggested and try to get support from the RStudio Community. Thanks again!

JonFain90 commented 2 years ago

This is awesome, Julie! Do you perhaps have advice on how to change the colors of the facets? I am doing something similar with two facets and want to change them to customized colors and not the default R colors. Thanks in advance!

juliasilge commented 2 years ago

@JonFain90 You should be able to change these colors like you would any other ggplot object, like with scale_fill_manual() or another appropriate scale_fill_* function.

JonFain90 commented 2 years ago

Thanks Julia!

emmansh commented 2 years ago

This post inspired me to post a question at Stack Overflow, on how we could use reorder_within() when reordering within facets and still move specific bars manually -- within facets.

The user "StupidWolf" gave a nice and working solution, but I'm curious whether you (Julia) have a different perspective on this matter,

juliasilge commented 2 years ago

@emmansh I answered over there with another possible solution, relying on forcats functions.

emmansh commented 2 years ago

@juliasilge, wonderful! thnak you :)

phaya commented 2 years ago

Thank very much @juliasilge, it is an amazing post, and extremely useful. I would wondering if it would be useful that you update the examples of Text Mining with R: chapter 3 (https://www.tidytextmining.com/tfidf.html) The code works correctly with the datasets used in the book, but changing the datasets plots may wrongly ordered.

juliasilge commented 2 years ago

@phaya I added a note box to Ch 3 of our tidytext book, with a link to where it is used elsewhere in the book. You can see the new rendered section here.

phaya commented 2 years ago

@phaya I added a note box to Ch 3 of our tidytext book, with a link to where it is used elsewhere in the book. You can see the new rendered section here.

Awesome! Thank very much Julia!

emihoe commented 2 years ago

Hi Julia, Thank you for writing such a comprehensive tutorial. I wondering if you can suggest how you would go about reorganising by sample name within each facet, rather than value based sorting. Thanks!

juliasilge commented 2 years ago

@emihoe I believe the default ordering within each facet would be by the observation, if I am understanding you. Maybe that's not what you mean, though. Can you create a reprex (a minimal reproducible example) for your question, and post the question on RStudio Community? The goal of a reprex is to make it easier for us to recreate your issue so that we can understand it and/or fix it.

If you've never heard of a reprex before, you may want to start with the tidyverse.org help page. You may already have reprex installed (it comes with the tidyverse package), but if not you can install it with:

install.packages("reprex")

Thanks! 🙌

trungdangm commented 1 year ago

Thank Julia, I just made a similar graph using your approach. It works. I did take a look on stackoverflow. There are other approaches (https://stackoverflow.com/questions/34001024/ggplot-order-bars-in-faceted-bar-chart-per-facet) . When I copied their data and their code. It worked. However, when I tried to mimic their code with my own data. I did not work at all. I still dont understand why. (My data has the same shape as their). But now, i use yoour approach. But I still curious about why it did not work when I tried other approach. I also tried to sort bar for 100% stacked bar chart. So far, it has not been successful. I did use the above approach but it does not work. Any help will be appreciated. Trung

juliasilge commented 1 year ago

@trungdangm Can you create a reprex (a minimal reproducible example) showing the problem you are running into? The goal of a reprex is to make it easier for people to recreate your problem so that we can understand it and/or fix it. If you've never heard of a reprex before, you may want to start with the tidyverse.org help page. Once you have a reprex, I recommend posting on RStudio Community, which is a great forum for getting help with these kinds of code questions. Thanks! 🙌

ebrillcass commented 1 year ago

I am so sorry, I realized what I was doing wrong already and that was my first comment ever with my GitHub so I would delete it if I even knew how. Awesome article though!

juliasilge commented 1 year ago

@ebrillcass Glad you got it working! 🙌

cujoisa commented 1 year ago

Just wanted to comment and say this was a lifesaver, thank you so much!

davidfoster24 commented 8 months ago

Hello, I came across your post and it has been very helpful in solving a challenge I am facing. However, I am looking to sort all facets in the same order, based on the frequency of the first facet. In your example above, it would be sorting all decades based on the prominence of names in the 1950s, in order to highlight the changes over time. Is this possible with the package described and/or others? Thanks!

juliasilge commented 8 months ago

@davidfoster24 You can do this with something like so, where you create the levels manually:

library(tidyverse)
library(babynames)

top_names <- babynames |>
    filter(year >= 1950,
           year < 1990) |>
    mutate(decade = (year %/% 10) * 10) |>
    group_by(decade) |>
    count(name, wt = n, sort = TRUE) |>
    ungroup()

name_levels <- top_names |> 
    filter(decade == 1950) |> 
    top_n(15) |> 
    pull(name)
#> Selecting by n

name_levels
#>  [1] "James"    "Michael"  "Robert"   "John"     "David"    "Mary"    
#>  [7] "William"  "Linda"    "Richard"  "Patricia" "Thomas"   "Susan"   
#> [13] "Deborah"  "Mark"     "Charles"

top_names |>
    group_by(decade) |>
    top_n(15) |>
    ungroup() |>
    mutate(
        decade = factor(decade),
        name = factor(name, levels = name_levels),
        name = fct_rev(name)
    ) |>
    ggplot(aes(n, name, fill = decade)) +
    geom_col(show.legend = FALSE) +
    facet_wrap(vars(decade), scales = "free_y")
#> Selecting by n

Created on 2023-10-17 with reprex v2.0.2

However, notice that in this case, many of the names in the later decades are not in the top 15 at all in the first decade and they are then converted to NA. You'll need to decide how to handle this, if it applies to your situation.

davidfoster24 commented 8 months ago

@juliasilge this was exactly what I needed! THANK YOU. I think I found a new favourite blog for data in R!

bhadrayu-22 commented 1 week ago

Hi Julia,

Thank you for your helpful article.

I followed the code example to get a plot with 2 facets - each with sorted X axes. However, previously I was supplying labels/axis text manually to my unsorted facet plot by using scale_x_discrete(labels = c(...)) and with this newer maneuver, I am unable to.

In short, if I were to use the example you provided but have the names on the X-axis instead, and wanted to supply my own abbreviations like Mike, Dave, Jamie, Linds... for Michael, David, James, Linda... (from the decade of 1950) for all facets- how would you suggest I go about that

Another example would be: If I had axis texts like Apple Ball Cat Donkey, ... and because of the reorder within function the axis texts came out to be Donkey Ball Cat Apple is there a way to supply labels like D B C A so I can have D B C A on the axes instead.

Thanks!

juliasilge commented 1 week ago

@bhadrayu-22 I would do that using fct_recode from forcats:

library(tidyverse)
library(tidytext)
data("penguins", package = 'palmerpenguins')

penguin_mass <- penguins |>
  filter(!is.na(sex)) |>
  group_by(species, sex) |>
  summarise(body_mass_g = median(body_mass_g, na.rm = TRUE)) |>
  ungroup()
#> `summarise()` has grouped output by 'species'. You can override using the
#> `.groups` argument.

penguin_mass |>
  mutate(
    species_fact = fct_recode(species, Apple = "Gentoo", Banana = "Chinstrap"),
    species = reorder_within(species_fact, body_mass_g, sex),
  ) |>
  ggplot(aes(body_mass_g, species, fill = species_fact)) +
  geom_col(show.legend = FALSE) +
  scale_y_reordered() +
  facet_wrap(vars(sex), scales = "free_y")

Created on 2024-06-24 with reprex v2.1.0

bhadrayu-22 commented 1 week ago

Hi Julia,

Thanks a lot for your quick reply!

This worked perfectly for me.

Regards,