plotly / plotly.R

An interactive graphing library for R
https://plotly-r.com
Other
2.56k stars 626 forks source link

Bubble sizes out of order, when using a formula for the `color`/`name` attributes #2346

Open JElchison opened 6 months ago

JElchison commented 6 months ago

Hi folks, I'm seeing a strange effect on scatter plot marker sizes, when using a formula for the color (and name) attributes.

(edited to add more test cases)

library(plotly)
#> Loading required package: ggplot2
#> 
#> Attaching package: 'plotly'
#> The following object is masked from 'package:ggplot2':
#> 
#>     last_plot
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following object is masked from 'package:graphics':
#> 
#>     layout

df <- data.frame(x = c(1, 2, 3, 4, 5),
                 y = c(1, 2, 3, 4, 5),
                 z = c(1, 2, 3, 4, 5))

# Expected output: marker sizes in order
plot_ly(df,
        x = ~x,
        y = ~y,
        type = "scatter",
        mode = "markers",
        marker = list(size = ~z,
                      sizeref = 0.1),
        color = ~z < 2,
        colors = c(I("green"), I("red")),
        text = ~paste0("z: ", z))


# df has correct data
df
#>   x y z
#> 1 1 1 1
#> 2 2 2 2
#> 3 3 3 3
#> 4 4 4 4
#> 5 5 5 5

# Buggy output: changing "color" formula threshold puts marker sizes out of order
plot_ly(df,
        x = ~x,
        y = ~y,
        type = "scatter",
        mode = "markers",
        marker = list(size = ~z,
                      sizeref = 0.1),
        color = ~z < 3,
        colors = c(I("green"), I("red")),
        text = ~paste0("z: ", z))


# df still has correct data
df
#>   x y z
#> 1 1 1 1
#> 2 2 2 2
#> 3 3 3 3
#> 4 4 4 4
#> 5 5 5 5

# Buggy output: static vector also puts marker sizes out of order
plot_ly(df,
        x = ~x,
        y = ~y,
        type = "scatter",
        mode = "markers",
        marker = list(size = ~z,
                      sizeref = 0.1),
        color = c(TRUE, TRUE, FALSE, FALSE, FALSE),
        colors = c(I("green"), I("red")),
        text = ~paste0("z: ", z))


# Buggy output: use ifelse with (TRUE, FALSE)
plot_ly(df,
        x = ~x,
        y = ~y,
        type = "scatter",
        mode = "markers",
        marker = list(size = ~z,
                      sizeref = 0.1),
        color = ~ifelse(z < 3, TRUE, FALSE),
        colors = c(I("green"), I("red")),
        text = ~paste0("z: ", z))


# Possible workaround: use ifelse with (1, 0)
plot_ly(df,
        x = ~x,
        y = ~y,
        type = "scatter",
        mode = "markers",
        marker = list(size = ~z,
                      sizeref = 0.1),
        color = ~ifelse(z < 3, 1, 0),
        colors = c(I("green"), I("red")),
        text = ~paste0("z: ", z))


# Buggy again when adding "name" field with formula
plot_ly(df,
        x = ~x,
        y = ~y,
        type = "scatter",
        mode = "markers",
        marker = list(size = ~z,
                      sizeref = 0.1),
        color = ~ifelse(z < 3, 1, 0),
        colors = c(I("green"), I("red")),
        name = ~ifelse(z < 3, "Red", "Green"),
        text = ~paste0("z: ", z))

Created on 2024-04-06 with reprex v2.1.0

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.3 (2024-02-29) #> os Ubuntu 22.04.4 LTS #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/Indiana/Vevay #> date 2024-04-06 #> pandoc 3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> callr 3.7.5 2024-02-19 [1] CRAN (R 4.3.2) #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.2) #> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0) #> crosstalk 1.2.1 2023-11-23 [1] CRAN (R 4.3.2) #> curl 5.2.0 2023-12-08 [1] CRAN (R 4.3.2) #> data.table 1.15.0 2024-01-30 [1] CRAN (R 4.3.2) #> digest 0.6.34 2024-01-11 [1] CRAN (R 4.3.2) #> dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.3.2) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.2) #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.2) #> farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.0) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.2) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0) #> ggplot2 * 3.5.0 2024-02-23 [1] CRAN (R 4.3.2) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.2) #> gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.2) #> highr 0.10 2022-12-22 [1] CRAN (R 4.3.0) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.2) #> htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.3.2) #> httr 1.4.7 2023-08-15 [1] CRAN (R 4.3.2) #> jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.3.2) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.2) #> lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.3.0) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.2) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0) #> plotly * 4.10.4 2024-01-13 [1] CRAN (R 4.3.2) #> processx 3.8.3 2023-12-10 [1] CRAN (R 4.3.2) #> ps 1.7.6 2024-01-18 [1] CRAN (R 4.3.2) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.2) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.1) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.1) #> R.oo 1.26.0 2024-01-24 [1] CRAN (R 4.3.2) #> R.utils 2.12.3 2023-11-18 [1] CRAN (R 4.3.2) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0) #> reprex 2.1.0 2024-01-11 [1] CRAN (R 4.3.3) #> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.2) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.2) #> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.2) #> scales 1.3.0 2023-11-28 [1] CRAN (R 4.3.2) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) #> styler 1.10.2 2023-08-29 [1] CRAN (R 4.3.1) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0) #> tidyr 1.3.1 2024-01-24 [1] CRAN (R 4.3.2) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.2) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2) #> viridisLite 0.4.2 2023-05-02 [1] CRAN (R 4.3.0) #> webshot 0.5.5 2023-06-26 [1] CRAN (R 4.3.2) #> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.2) #> xfun 0.42 2024-02-08 [1] CRAN (R 4.3.2) #> xml2 1.3.6 2023-12-04 [1] CRAN (R 4.3.2) #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.2) #> #> [1] /home/jonathan/R/x86_64-pc-linux-gnu-library/4.3 #> [2] /usr/local/lib/R/site-library #> [3] /usr/lib/R/site-library #> [4] /usr/lib/R/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

Bad color formulas are:

The results from z < 2 may also be incorrect ... they're just indiscernible given my example.

This behavior causes significant skewing/confusion for bubble plots of any size.

Thanks for reading!

bklingen commented 6 months ago

I don't think you should map to variables inside markers. Declare the size mapping "outside", see below, and you get the correct result:

plot_ly(df,
        x = ~x,
        y = ~y,
        type = "scatter",
        mode = "markers",
        marker = list(sizeref = 0.1),
        size = ~z,
        color = ~z < 3,
        colors = c(I("green"), I("red")),
        text = ~paste0("z: ", z))
image
JElchison commented 6 months ago

Hi @bklingen, thanks for your help!

png mismatch

As an aside, I don't think your png matches your code, because (given your color threshold of z < 3) the success case should show 2 reds on the small side, not 3. However, that's irrelevant to your tip.

Your workaround, with new warnings

But beyond that, it does look like setting the size attribute instead of marker.size successfully works around my issue.

Did you notice that your workaround causes these warnings?

Warning messages:
1: `line.width` does not currently support multiple values. 
2: `line.width` does not currently support multiple values. 

I'm not sure what to make of those, but there are some SO topics related, such as: https://stackoverflow.com/questions/52692760/spurious-warning-when-mapping-marker-size-in-plotly-r

Intended behavior?

More critically, though, I do question is whether this outcome is the intended behavior. Do you have any supporting documentation you could link to, showing why I should use size instead of marker.size?

Here's what I could find:

  1. https://plotly.com/r/reference/scatter/, which states that:

    Bubble charts are achieved by setting marker.size and/or marker.color to numerical arrays.

According to the same documentation page, neither size nor color is a parent-level attribute. colors isn't mentioned anywhere (and it produces different behavior from colorscale).

Further, this doesn't seem to line up with the examples at...

  1. https://plotly.com/r/bubble-charts/, where 2 of the 7 examples use size instead of marker.size

Strangely, 7 of 7 examples there use color instead of marker.color. Further, all examples use the undocumented colors (top level) attribute.

Summary -- A bug in code or documentation?

I'm (very) happy to use your workaround, but because the documented code doesn't produce the documented behavior, it seems like either:

Any additional thoughts? Thanks!

Workaround in action

Finally, for posterity, here's the functioning workaround, but with 2 warnings:

library(plotly)
#> Loading required package: ggplot2
#> 
#> Attaching package: 'plotly'
#> The following object is masked from 'package:ggplot2':
#> 
#>     last_plot
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following object is masked from 'package:graphics':
#> 
#>     layout

df <- data.frame(x = c(1, 2, 3, 4, 5),
                 y = c(1, 2, 3, 4, 5),
                 z = c(1, 2, 3, 4, 5))

plot_ly(df,
        x = ~x,
        y = ~y,
        type = "scatter",
        mode = "markers",
        marker = list(sizeref = 0.1),
        # `size` is undocumented at this level.  https://plotly.com/r/reference/scatter/ shows `marker.size`
        size = ~z,
        # `color` is undocumented at this level.  https://plotly.com/r/reference/scatter/ shows `marker.color`
        color = ~z < 3,
        # `colors` is undocumented
        colors = c(I("green"), I("red")),
        name = ~ifelse(z < 3, "Red", "Green"),
        text = ~paste0("z: ", z))
#> Warning: `line.width` does not currently support multiple values.

#> Warning: `line.width` does not currently support multiple values.

Created on 2024-04-09 with reprex v2.1.0

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.3 (2024-02-29) #> os Ubuntu 22.04.4 LTS #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/Indiana/Vevay #> date 2024-04-09 #> pandoc 3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> callr 3.7.5 2024-02-19 [1] CRAN (R 4.3.2) #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.2) #> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0) #> crosstalk 1.2.1 2023-11-23 [1] CRAN (R 4.3.2) #> curl 5.2.0 2023-12-08 [1] CRAN (R 4.3.2) #> data.table 1.15.0 2024-01-30 [1] CRAN (R 4.3.2) #> digest 0.6.34 2024-01-11 [1] CRAN (R 4.3.2) #> dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.3.2) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.2) #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.2) #> farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.0) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.2) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0) #> ggplot2 * 3.5.0 2024-02-23 [1] CRAN (R 4.3.2) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.2) #> gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.2) #> highr 0.10 2022-12-22 [1] CRAN (R 4.3.0) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.2) #> htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.3.2) #> httr 1.4.7 2023-08-15 [1] CRAN (R 4.3.2) #> jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.3.2) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.2) #> lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.3.0) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.2) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0) #> plotly * 4.10.4 2024-01-13 [1] CRAN (R 4.3.3) #> processx 3.8.3 2023-12-10 [1] CRAN (R 4.3.2) #> ps 1.7.6 2024-01-18 [1] CRAN (R 4.3.2) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.2) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.1) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.1) #> R.oo 1.26.0 2024-01-24 [1] CRAN (R 4.3.2) #> R.utils 2.12.3 2023-11-18 [1] CRAN (R 4.3.2) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0) #> reprex 2.1.0 2024-01-11 [1] CRAN (R 4.3.3) #> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.2) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.2) #> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.2) #> scales 1.3.0 2023-11-28 [1] CRAN (R 4.3.2) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) #> styler 1.10.2 2023-08-29 [1] CRAN (R 4.3.1) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0) #> tidyr 1.3.1 2024-01-24 [1] CRAN (R 4.3.2) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.2) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2) #> viridisLite 0.4.2 2023-05-02 [1] CRAN (R 4.3.0) #> webshot 0.5.5 2023-06-26 [1] CRAN (R 4.3.2) #> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.2) #> xfun 0.42 2024-02-08 [1] CRAN (R 4.3.2) #> xml2 1.3.6 2023-12-04 [1] CRAN (R 4.3.2) #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.2) #> #> [1] /home/jonathan/R/x86_64-pc-linux-gnu-library/4.3 #> [2] /usr/local/lib/R/site-library #> [3] /usr/lib/R/site-library #> [4] /usr/lib/R/library #> #> ────────────────────────────────────────────────────────────────────────────── ```