wilkelab / ggtext

Improved text rendering support for ggplot2
https://wilkelab.org/ggtext/
GNU General Public License v2.0
655 stars 37 forks source link

Greek symbols in html ggtext #1

Closed JMLuther closed 5 years ago

JMLuther commented 5 years ago

This package looks great- I'll use it quite a bit, especially for the image labels and text formatting. Thanks! I'm an end-user, not a programmer so please excuse if this is a trivial misunderstanding of how to use.

An issue that ggplot seems to have fixed- using greek symbols, now worked around by using unicode in text-, is giving me an issue when mixed into html ggtext.

see relavent SO question here

Just an example- adding the greek mu symbol with unicode into the html style text is not interpreted as I expect (may be my misunderstanding?).

library(ggplot2)
library(ggtext)

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
  geom_point(size = 3) +
  scale_color_manual(
    name = NULL,
    values = c(setosa = "#0072B2", virginica = "#009E73", versicolor = "#D55E00"),
    labels = c(
      setosa = "<i style='color:#0072B2'>I. setosa  \u03bc </i>",
      virginica = "<i style='color:#009E73'>I. virginica  \u03bc </i>",
      versicolor = "<i style='color:#D55E00'>I. versicolor  \u03bc </i>")
  ) +
  labs(
    title = "**Fisher's *Iris* dataset  (test unicode symbol: \u03bc)**  
    <span style='font-size:11'>Sepal width vs. sepal length for three *Iris*
    species  \u03bc </span>",
    x = "Sepal length (cm)\n (test unicode symbol: \u03bc)", 
    y = "Sepal width (cm)\n (test unicode symbol: \u03bc)"
    ) +
  theme_minimal() +
  theme(
    plot.title = element_markdown(lineheight = 1.1),
    legend.text = element_markdown(size = 11)
  )

Here's the basic example of what I'm looking for, without ggtext.


ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
  geom_point(size = 3) +
  labs(
    title = "Fisher's *Iris* dataset (test unicode symbol: \u03bc)",
    x = "Sepal length (cm)\n (test unicode symbol: \u03bc)", 
    y = "Sepal width (cm)\n (test unicode symbol: \u03bc)",
    color = "Species \n (test unicode symbol: \u03bc)"
  ) +
  theme_minimal()

Created on 2019-08-09 by the reprex package (v0.3.0)

RCura commented 5 years ago

HTML's unicode representation is different : for greek letters, you can for example refer to http://www.alanwood.net/demos/symbol.html

So, this works as expected :

library(ggplot2)
library(ggtext)
ggplot(mtcars) +
  aes(mpg, disp) +
  geom_point() +
  labs(title = "TEST &mu;",
    x = "*abc* &alpha;",
    y = "**bold &beta;**") +
  theme(plot.title = element_markdown(),
    axis.title.x = element_markdown(),
    axis.title.y = element_markdown())

image

clauswilke commented 5 years ago

Yes, correct, HTML entities are supported.

Alternatively, it's also possible to place the unicode symbols directly into the strings.

library(ggplot2)
library(ggtext)
ggplot(mtcars) +
  aes(mpg, disp) +
  geom_point() +
  labs(title = "TEST μ",
       x = "*abc* α",
       y = "**bold β**") +
  theme(plot.title = element_markdown(),
        axis.title.x = element_markdown(),
        axis.title.y = element_markdown())

Created on 2019-08-09 by the reprex package (v0.3.0)

RCura commented 5 years ago

BTW, juste noted that element_markdown() "looses" the default hjust/vjust.

Something like this is needed for the OP :

library(ggplot2)
library(ggtext)

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
  geom_point(size = 3) +
  scale_color_manual(
    name = NULL,
    values = c(setosa = "#0072B2", virginica = "#009E73", versicolor = "#D55E00"),
    labels = c(
      setosa = "<i style='color:#0072B2'>I. setosa  &mu; </i>",
      virginica = "<i style='color:#009E73'>I. virginica  &mu; </i>",
      versicolor = "<i style='color:#D55E00'>I. versicolor  &mu; </i>")
  ) +
  labs(
    title = "**Fisher's *Iris* dataset  (test unicode symbol: &mu;)**  
    <span style='font-size:11'>Sepal width vs. sepal length for three *Iris*
    species  &mu; </span>",
    x = "Sepal length (cm)<br>(test unicode symbol: &mu;)", 
    y = "Sepal width (cm)<br> (test unicode symbol: &mu;)"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_markdown(lineheight = 1.1),
    legend.text = element_markdown(size = 11),
    axis.title.x = element_markdown(hjust = 0.5),
    axis.title.y = element_markdown(vjust = 0.5)
  )

image

JMLuther commented 5 years ago

Thanks @RCura - I thought your answer would clear it up for me, but I can't reproduce on my system... hmmm...suspect a windows issue? note: i get the same results using &mu; notation.

library(ggplot2)
library(ggtext)
ggplot(mtcars) +
  aes(mpg, disp) +
  geom_point() +
  labs(title = "TEST &mu;",
       x = "*abc* &alpha;",
       y = "**bold &beta;**") +
  theme(plot.title = element_markdown(),
        axis.title.x = element_markdown(),
        axis.title.y = element_markdown())

Created on 2019-08-09 by the reprex package (v0.3.0)

clauswilke commented 5 years ago

Yes, I suppose it's an issue with your system. It turns out the original example works just fine for me. (In fact, I couldn't think of a good reason why it shouldn't; I should have tried earlier.)

library(ggplot2)
library(ggtext)

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
  geom_point(size = 3) +
  scale_color_manual(
    name = NULL,
    values = c(setosa = "#0072B2", virginica = "#009E73", versicolor = "#D55E00"),
    labels = c(
      setosa = "<i style='color:#0072B2'>I. setosa  \u03bc </i>",
      virginica = "<i style='color:#009E73'>I. virginica  \u03bc </i>",
      versicolor = "<i style='color:#D55E00'>I. versicolor  \u03bc </i>")
  ) +
  labs(
    title = "**Fisher's *Iris* dataset  (test unicode symbol: \u03bc)**  
    <span style='font-size:11'>Sepal width vs. sepal length for three *Iris*
    species  \u03bc </span>",
    x = "Sepal length (cm)\n (test unicode symbol: \u03bc)", 
    y = "Sepal width (cm)\n (test unicode symbol: \u03bc)"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_markdown(lineheight = 1.1),
    legend.text = element_markdown(size = 11)
  )

Created on 2019-08-09 by the reprex package (v0.3.0)

clauswilke commented 5 years ago

@RCura Would you mind opening a separate issue for the hjust/vjust bug? That's a matter of element inheritance, I believe, and it should be fixable.

JMLuther commented 5 years ago

Thanks, both. I'll try to figure it out.

tungttnguyen commented 5 years ago

I'm having the same problem as @JMLuther on Windows 10 OS

 version  R version 3.6.1 (2019-07-05)
 os       Windows 10 x64              
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  English_United States.1252  
 ctype    English_United States.1252  
 tz       America/Los_Angeles         
 date     2019-08-09 

ggplot2     * 3.2.0.9000 2019-08-09 [1] Github (tidyverse/ggplot2@541ae99)  
ggtext      * 0.1.0      2019-08-09 [1] Github (clauswilke/ggtext@5c7cfa9)  

JMLuther commented 5 years ago

I've put in a SO question here. The issue occurs when the element_markdown() is added. using basic grid interprets the unicode correctly, as in this example where I've stripped the html and markdown text.

library(ggplot2)
library(grid)

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
  geom_point(size = 3) +
  scale_color_manual(
    name = NULL,
    values = c(setosa = "#0072B2", virginica = "#009E73", versicolor = "#D55E00"),
    labels = c("setosa  \u03bc ", 
               "virginica  \u03bc",
               "versicolor  \u03bc")
  ) +
  labs(
    title = "Fisher's *Iris* dataset  (test unicode symbol: \u03bc)
    Sepal width vs. sepal length for three *Iris* species  \u03bc",
    x = "Sepal length (cm)\n (test unicode symbol: \u03bc)", 
    y = "Sepal width (cm)\n (test unicode symbol: \u03bc)"
  ) +
  theme_minimal() #+

  # theme(
    # plot.title = element_markdown(lineheight = 1.1),
    # legend.text = element_markdown(size = 11)
  # )

Created on 2019-08-10 by the reprex package (v0.3.0)

JMLuther commented 5 years ago

It also does not appear to be just a grid.text() interpretation, as suggested on SO Q... (grid.text annotation in red)

library(ggplot2)
library(grid)

pl1 <- 
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
  geom_point(size = 3) +
  scale_color_manual(
    name = NULL,
    values = c(setosa = "#0072B2", virginica = "#009E73", versicolor = "#D55E00"),
    labels = c("setosa  \u03bc ", 
               "virginica  \u03bc",
               "versicolor  \u03bc")
  ) +
  labs(
    title = "Fisher's *Iris* dataset  (test unicode symbol: \u03bc)
    Sepal width vs. sepal length for three *Iris* species  \u03bc",
    x = "Sepal length (cm)\n (test unicode symbol: \u03bc)", 
    y = "Sepal width (cm)\n (test unicode symbol: \u03bc)"
  ) +
  theme_minimal() #+
  # theme(
    # plot.title = element_markdown(lineheight = 1.1),
    # legend.text = element_markdown(size = 11)
  # )

my_grob <- grid.text(label =  "grid.text: \u03bc", 
                     gp=gpar(col="red", fontsize=14, fontface="bold"),
                     # x=7, y=4.25,
                     just = "center", rot = 45)
pl1 + 
  annotation_custom(grob = my_grob,
                    xmin = 6, xmax = 8,
                    ymin = 3.75, ymax = 4.5)

Created on 2019-08-10 by the reprex package (v0.3.0)

clauswilke commented 5 years ago

How about this reprex? Does it work?

library(ggplot2)
library(ggtext)

df <- data.frame(
  label = c("setosa  \u03bc ", "virginica  \u03bc", "versicolor  \u03bc"),
  x = c(1, 2, 3),
  y = c(3, 2, 1),
  stringsAsFactors = FALSE
)

df2 <- data.frame(
  label = c("setosa  \u03bc ", "virginica  \u03bc", "versicolor  \u03bc"),
  x = c(1, 2, 3),
  y = c(1, 3, 2),
  stringsAsFactors = TRUE
)

ggplot(NULL, aes(x, y, label = label)) +
  geom_rich_text(data = df) +
  geom_rich_text(data = df2, fill = "cornsilk") +
  xlim(0, 4)

Created on 2019-08-10 by the reprex package (v0.3.0)

JMLuther commented 5 years ago

No. Same issue. I've included output with my session info, fyi.

library(ggplot2)
library(ggtext)

df <- data.frame(
  label = c("setosa  \u03bc ", "virginica  \u03bc", "versicolor  \u03bc"),
  x = c(1, 2, 3),
  y = c(3, 2, 1),
  stringsAsFactors = FALSE
)

df2 <- data.frame(
  label = c("setosa  \u03bc ", "virginica  \u03bc", "versicolor  \u03bc"),
  x = c(1, 2, 3),
  y = c(1, 3, 2),
  stringsAsFactors = TRUE
)

ggplot(NULL, aes(x, y, label = label)) +
  geom_rich_text(data = df) +
  geom_rich_text(data = df2, fill = "cornsilk") +
  xlim(0, 4)


sessionInfo()
#> R version 3.6.1 (2019-07-05)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 17763)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggtext_0.1.0  ggplot2_3.2.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.2       xml2_1.2.2       knitr_1.24       magrittr_1.5    
#>  [5] gridtext_0.1.0   tidyselect_0.2.5 munsell_0.5.0    colorspace_1.4-1
#>  [9] R6_2.4.0         rlang_0.4.0      stringr_1.4.0    highr_0.8       
#> [13] dplyr_0.8.3      tools_3.6.1      grid_3.6.1       gtable_0.3.0    
#> [17] xfun_0.8         withr_2.1.2      htmltools_0.3.6  assertthat_0.2.1
#> [21] yaml_2.2.0       lazyeval_0.2.2   digest_0.6.20    tibble_2.1.3    
#> [25] crayon_1.3.4     purrr_0.3.2      glue_1.3.1       evaluate_0.14   
#> [29] rmarkdown_1.14   labeling_0.3     stringi_1.4.3    compiler_3.6.1  
#> [33] pillar_1.4.2     scales_1.0.0     markdown_1.1     pkgconfig_2.0.2

Created on 2019-08-10 by the reprex package (v0.3.0)

clauswilke commented 5 years ago

How about this one?

library(grid)
library(gridtext)

label <- c("setosa  \u03bc ", "virginica  \u03bc", "versicolor  \u03bc")
x <- c(.2, .4, .6)
y <- c(.6, .4, .2)

grid.newpage()
grid.draw(textGrob(label, x, y))
grid.draw(rich_text_grob(label, x + .1, y + .2))

Created on 2019-08-10 by the reprex package (v0.3.0)

clauswilke commented 5 years ago

And another one to try:

library(grid)

x <- c(.2, .4, .6)
y <- c(.6, .4, .2)

text <- c("special char:  \u03bc ")
text2 <- markdown::markdownToHTML(text = text, options = c("use_xhtml", "fragment_only"))

doctree <- xml2::read_html(text2)
text3 <- xml2::as_list(doctree)$html$body$p[[1]]

grid.newpage()
grid.draw(textGrob(c(text, text2, text3), x, y))

Created on 2019-08-10 by the reprex package (v0.3.0)

JMLuther commented 5 years ago

Still issues with rich_text_grob() in first one. I get the same results as you with the second example.

library(grid)
library(gridtext)

label <- c("setosa  \u03bc ", "virginica  \u03bc", "versicolor  \u03bc")
x <- c(.2, .4, .6)
y <- c(.6, .4, .2)

grid.newpage()
grid.draw(textGrob(label, x, y))
grid.draw(rich_text_grob(label, x + .1, y + .2))


text <- c("special char:  \u03bc ")
text2 <- markdown::markdownToHTML(text = text, options = c("use_xhtml", "fragment_only"))

doctree <- xml2::read_html(text2)
text3 <- xml2::as_list(doctree)$html$body$p[[1]]

grid.newpage()
grid.draw(textGrob(c(text, text2, text3), x, y))

Created on 2019-08-10 by the reprex package (v0.3.0)

clauswilke commented 5 years ago

So maybe the problem is the string processing with the stringr library. Could you try this one?

library(grid)

x <- c(.2, .4, .6, .8)
y <- c(.8, .6, .4, .2)

text <- c("special char:  \u03bc ")
text2 <- markdown::markdownToHTML(text = text, options = c("use_xhtml", "fragment_only"))

doctree <- xml2::read_html(text2)
text3 <- xml2::as_list(doctree)$html$body$p[[1]]

text4 <- stringr::str_split(stringr::str_squish(text3), "[[:space:]]+")[[1]][3]

grid.newpage()
grid.draw(textGrob(c(text, text2, text3, text4), x, y))

Created on 2019-08-10 by the reprex package (v0.3.0)

tungttnguyen commented 5 years ago

The last example works for me

clauswilke commented 5 years ago

If the last example (including the stringr calls) still doesn't mangle the special character, then one last possibility is that somehow the round-trip through the C++ code is the problem. How about this reprex?

library(grid)

text <- c("special char:  \u03bc ")
box <- gridtext:::bl_make_text_box(text, gpar())
gridtext:::bl_calc_layout(box)
g <- gridtext:::bl_render(box, 100, 100)
grid.newpage()
grid.draw(g)

Created on 2019-08-11 by the reprex package (v0.3.0)

JMLuther commented 5 years ago

that indeed appears to be the issue: (the prior example worked fine on my system, just as in @tungmilan response.)

library(grid)

text <- c("special char:  \u03bc ")
box <- gridtext:::bl_make_text_box(text, gpar())
gridtext:::bl_calc_layout(box)
g <- gridtext:::bl_render(box, 100, 100)
grid.newpage()
grid.draw(g)

Created on 2019-08-11 by the reprex package (v0.3.0)

clauswilke commented 5 years ago

Ok, so we have narrowed it down to the round trip via C++. Not sure why this happens, as I'm not actually modifying the string, but it is what it is. Would you mind opening an issue for the gridtext package with this reprex? The problem is with gridtext, not with ggtext.

clauswilke commented 5 years ago

I think I've found the problem. When working with individual String objects in Rcpp, the encoding gets lost. I can work around that.

library(Rcpp)

cppFunction('CharacterVector test1(String s) {
  CharacterVector v(s);
  return v;
}')

cppFunction('CharacterVector test2(const String &s) {
  CharacterVector v(s);
  return v;
}')

cppFunction('CharacterVector test3(CharacterVector c) {
  CharacterVector v(c);
  return v;
}')

x <- "special char: \u03bc"
test1(x)
#> [1] "special char: μ"
test2(x)
#> [1] "special char: μ"
test3(x)
#> [1] "special char: μ"

Encoding(x)
#> [1] "UTF-8"
Encoding(test1(x))
#> [1] "unknown"
Encoding(test2(x))
#> [1] "unknown"
Encoding(test3(x))
#> [1] "UTF-8"

Created on 2019-08-11 by the reprex package (v0.3.0)

JMLuther commented 5 years ago

Yes, that's it- can confirm on my Windows system that test1 and test2 give the same nonsense symbol as in my prior plots and the test3 interprets it appropriately as μ

clauswilke commented 5 years ago

Ok. I consider it an Rcpp bug and I'd prefer they fix it, but if they can't or won't then I'll be able to work around it. I'll close this issue here because this is a gridtext problem, not a ggtext problem.

clauswilke commented 5 years ago

@JMLuther Could you update to the latest version of gridtext (via devtools::install_github("clauswilke/gridtext")) and see if the problem is gone for you?

JMLuther commented 5 years ago

YES! thanks