moderndive / moderndive

R package for tidyverse-friendly introductory linear regression
https://moderndive.github.io/moderndive/
GNU General Public License v3.0
86 stars 60 forks source link

geom_parallel_slopes but for one categorical explanatory variable models? #72

Closed wjhopper closed 4 years ago

wjhopper commented 4 years ago

One visualization tool I thought could enhance the package/book would be a function to visualize the fitted values of a regression model with one categorical explanatory variable. Even though the visualization would just be horizontal line segments, I think providing this visualization would reinforce the similarities between regression using a numeric explanatory variable and a categorical explanatory variable.

I think the final visualization would look something like this (but without the lines extending across the entire x axis).

categorical_regression_lines

The function I'm proposing would just add the horizontal lines to the existing ggplot, the same way geom_smooth and geom_parallel_slopes do.

I originally thought of this as an extension of functionally for geom_parallel_slopes but I realized it might be confusing to have the parallel slopes idea come in before the multiple regression situation. Perhaps this type of categorical "smoothing" should be it's own function, like geom_categorical_model?

If this would be considered a useful enhancement, I would be happy to contribute it.

rudeboybert commented 4 years ago

Hey @wjhopper, sorry for the delay. Two questions:

Q1 This would work as follows

ggplot(data, aes(x = x_var,  _var)) +
    geom_categorical_model()

for, and only for,

  1. x_var categorical
  2. y_var numerical

correct?

Q2 If my understanding of Q1 is correct, what do you think of the following? Coloring the line corresponding to the baseline group mean (here "Bert") with one color, and coloring all other group means with another color?

wjhopper commented 4 years ago

Q1: Yes, that's what I was thinking Q2: What about using one color per group, but using a different linetype for the baseline? I know it would be redundant to have both color and the x axis used for the group, but grouping by color is such an automatic process that people might believe all the non-baseline levels really are somehow part of one group. And, using color to distinguish groups would match up well with how color is used in later chapters.

rudeboybert commented 4 years ago

Q2: Yep! linetype as the mechanism to distinguish baseline vs non-baseline makes much more sense than color. If you're willing to code this up, we'll gladly accept a PR

wjhopper commented 4 years ago

OK, will do!