ropensci / katex

Server side math to html rendering in R
https://docs.ropensci.org/katex/reference/katex.html
Other
37 stars 3 forks source link

non-ascii characters not encoded correctly on Windows 10 #2

Closed bwiernik closed 3 years ago

bwiernik commented 3 years ago

Follow up on https://github.com/ropensci/katex/issues/1

Non-ASCII characters are still not working correctly on Windows 10 with the current GitHub main branch (commit 56fc96f):

katex::katex_html(katex::example_math(), preview = interactive())
#> [1] "<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>f</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mfrac><mn>1</mn><mrow><mi>s</mi><msqrt><mrow><mn>2</mn><mi>p</mi></mrow></msqrt></mrow></mfrac><msup><mi>e</mi><mrow><mo>-</mo><mfrac><mn>1</mn><mn>2</mn></mfrac><mo stretchy=\"false\">(</mo><mfrac><mrow><mi>x</mi><mo>-</mo><mi>µ</mi></mrow><mi>s</mi></mfrac><msup><mo stretchy=\"false\">)</mo><mn>2</mn></msup></mrow></msup></mrow><annotation encoding=\"application/x-tex\">f(x)= {\\frac{1}{\\sigma\\sqrt{2\\pi}}}e^{- {\\frac {1}{2}} (\\frac {x-\\mu}{\\sigma})^2}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2777777777777778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2777777777777778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.25144em;vertical-align:-0.93em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.32144em;\"><span style=\"top:-2.2027799999999997em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">s</span><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.90722em;\"><span class=\"svg-align\" style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\" style=\"padding-left:0.833em;\"><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">p</span></span></span><span style=\"top:-2.86722em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em;\"><svg width='400em' height='1.08em' viewBox='0 0 400000 1080' preserveAspectRatio='xMinYMin slice'><path d='M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z'/></svg></span></span></span><span class=\"vlist-s\"><U+200B></span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.13278em;\"><span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\"><U+200B></span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.93em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.0369199999999998em;\"><span style=\"top:-3.4130000000000003em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">-</span><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mopen nulldelimiter sizing reset-size3 size6\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8443142857142858em;\"><span style=\"top:-2.656em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span></span></span></span><span style=\"top:-3.2255000000000003em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line mtight\" style=\"border-bottom-width:0.049em;\"></span></span><span style=\"top:-3.384em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\"><U+200B></span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.344em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter sizing reset-size3 size6\"></span></span></span><span class=\"mopen mtight\">(</span><span class=\"mord mtight\"><span class=\"mopen nulldelimiter sizing reset-size3 size6\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.87905em;\"><span style=\"top:-2.656em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">s</span></span></span></span><span style=\"top:-3.2255000000000003em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line mtight\" style=\"border-bottom-width:0.049em;\"></span></span><span style=\"top:-3.4623857142857144em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">x</span><span class=\"mbin mtight\">-</span><span class=\"mord mathnormal mtight\">µ</span></span></span></span></span><span class=\"vlist-s\"><U+200B></span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.344em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter sizing reset-size3 size6\"></span></span><span class=\"mclose mtight\"><span class=\"mclose mtight\">)</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8913142857142857em;\"><span style=\"top:-2.931em;margin-right:0.07142857142857144em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span>"

Created on 2021-07-08 by the reprex package (v2.0.0)

image

Greek letters are pretty generally not working on Windows.

This roxygen2 block:

#' Compute the rank-biserial correlation
#' (\Sexpr[results=rd, stage=build]{katex::math_to_rd('r_{rb}', 'r_rb', FALSE)}),
#' Cliff's *delta* (\Sexpr[results=rd, stage=build]{katex::math_to_rd("\\\\delta", 'delta', FALSE)}),
#' rank epsilon squared (\Sexpr[results=rd, stage=build]{katex::math_to_rd('\\\\varepsilon^2', 'epsilon^2', FALSE)}), and
#' Kendall's \eqn{W} effect sizes for non-parametric (rank sum) tests.

produces:

image

Originally posted by @bwiernik in https://github.com/ropensci/katex/issues/1#issuecomment-876206307

jeroen commented 3 years ago

How exactly are you viewing this html? Is this the rstudio previewer?

It looks like your html output is correct (for example it contains the µ) however your browser seems to interpet the text as latin1 instead of utf-8?

If you manually generate the html document, can you try adding this in the <head> of the document:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Also for the R-documentation, maybe add Encoding: UTF-8 to your package DESCRIPTION file

bwiernik commented 3 years ago

Ah, yes. For the first issue, if I run katex::katex_html(katex::example_math(), preview = interactive()) in R GUI so it opens in my browser, it displays correctly. The issue with the first one seems to be limited to the RStudio Viewer.

For the package documentation, the encoding is still not correct if I open the HTML help file in my browser: image

Encoding: UTF-8 is already in the package DESCRIPTION file

bwiernik commented 3 years ago

Here is the generated HTML documentation. The header you gave was already there.

``` R: Effect size for non-parametric (rank sum) tests
rank_biserial {effectsize}R Documentation

Effect size for non-parametric (rank sum) tests

Description

Compute the rank-biserial correlation ( rrbr_{rb} ), Cliff's delta ( δ\delta ), rank epsilon squared ( ε2\varepsilon^2 ), and Kendall's W effect sizes for non-parametric (rank sum) tests.

Usage

rank_biserial(
  x,
  y = NULL,
  data = NULL,
  mu = 0,
  ci = 0.95,
  paired = FALSE,
  verbose = TRUE,
  ...,
  iterations
)

cliffs_delta(
  x,
  y = NULL,
  data = NULL,
  mu = 0,
  ci = 0.95,
  iterations = 200,
  verbose = TRUE,
  ...
)

rank_epsilon_squared(x, groups, data = NULL, ci = 0.95, iterations = 200, ...)

kendalls_w(
  x,
  groups,
  blocks,
  data = NULL,
  ci = 0.95,
  iterations = 200,
  verbose = TRUE,
  ...
)

Arguments

x

Can be one of:

  • A numeric vector, or a character name of one in data.

  • A formula in to form of DV ~ groups (for rank_biserial() and rank_epsilon_squared()) or DV ~ groups | blocks (for kendalls_w(); See details for the blocks and groups terminology used here).

  • A list of vectors (for rank_epsilon_squared()).

  • A matrix of blocks x groups (for kendalls_w()). See details for the blocks and groups terminology used here.

y

An optional numeric vector of data values to compare to x, or a character name of one in data. Ignored if x is not a vector.

data

An optional data frame containing the variables.

mu

a number indicating the value around which (a-)symmetry (for one-sample or paired samples) or shift (for independent samples) is to be estimated. See stats::wilcox.test.

ci

Confidence Interval (CI) level

paired

If TRUE, the values of x and y are considered as paired. This produces an effect size that is equivalent to the one-sample effect size on x - y.

verbose

Toggle warnings and messages on or off.

...

Arguments passed to or from other methods.

iterations

The number of bootstrap replicates for computing confidence intervals. Only applies when ci is not NULL. (Deprecated for rank_biserial()).

groups, blocks

A factor vector giving the group / block for the corresponding elements of x, or a character name of one in data. Ignored if x is not a vector.

Details

The rank-biserial correlation is appropriate for non-parametric tests of differences - both for the one sample or paired samples case, that would normally be tested with Wilcoxon's Signed Rank Test (giving the matched-pairs rank-biserial correlation) and for two independent samples case, that would normally be tested with Mann-Whitney's U Test (giving Glass' rank-biserial correlation). See stats::wilcox.test. In both cases, the correlation represents the difference between the proportion of favorable and unfavorable pairs / signed ranks (Kerby, 2014). Values range from -1 indicating that all values of the second sample are smaller than the first sample, to +1 indicating that all values of the second sample are larger than the first sample. (Cliff's delta is an alias to the rank-biserial correlation in the two sample case.)

The rank epsilon squared is appropriate for non-parametric tests of differences between 2 or more samples (a rank based ANOVA). See stats::kruskal.test. Values range from 0 to 1, with larger values indicating larger differences between groups.

Kendall's W is appropriate for non-parametric tests of differences between 2 or more dependent samples (a rank based rmANOVA), where each group (e.g., experimental condition) was measured for each block (e.g., subject). This measure is also common as a measure of reliability of the rankings of the groups between raters (blocks). See stats::friedman.test. Values range from 0 to 1, with larger values indicating larger differences between groups / higher agreement between raters.

Ties

When tied values occur, they are each given the average of the ranks that would have been given had no ties occurred. No other corrections have been implemented yet.

Value

A data frame with the effect size (r_rank_biserial, rank_epsilon_squared or Kendalls_W) and its CI (CI_low and CI_high).

Confidence Intervals

Confidence intervals for the rank-biserial correlation (and Cliff's delta) are estimated using the normal approximation (via Fisher's transformation). Confidence intervals for rank Epsilon squared, and Kendall's W are estimated using the bootstrap method (using the {boot} package).

References

  • Cureton, E. E. (1956). Rank-biserial correlation. Psychometrika, 21(3), 287-290.

  • Glass, G. V. (1965). A ranking variable analogue of biserial correlation: Implications for short-cut item analysis. Journal of Educational Measurement, 2(1), 91-95.

  • Kendall, M.G. (1948) Rank correlation methods. London: Griffin.

  • Kerby, D. S. (2014). The simple difference formula: An approach to teaching nonparametric correlation. Comprehensive Psychology, 3, 11-IT.

  • King, B. M., & Minium, E. W. (2008). Statistical reasoning in the behavioral sciences. John Wiley & Sons Inc.

  • Cliff, N. (1993). Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological bulletin, 114(3), 494.

  • Tomczak, M., & Tomczak, E. (2014). The need to report effect size estimates revisited. An overview of some recommended measures of effect size.

See Also

Other effect size indices: cohens_d(), effectsize(), eta_squared(), phi(), standardize_parameters()

Examples


# two-sample tests -----------------------

A <- c(48, 48, 77, 86, 85, 85)
B <- c(14, 34, 34, 77)
rank_biserial(A, B)

x <- c(1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30)
y <- c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29)
rank_biserial(x, y, paired = TRUE)

# one-sample tests -----------------------
x <- c(1.15, 0.88, 0.90, 0.74, 1.21)
rank_biserial(x, mu = 1)

# anova tests ----------------------------

x1 <- c(2.9, 3.0, 2.5, 2.6, 3.2) # control group
x2 <- c(3.8, 2.7, 4.0, 2.4) # obstructive airway disease group
x3 <- c(2.8, 3.4, 3.7, 2.2, 2.0) # asbestosis group
x <- c(x1, x2, x3)
g <- factor(rep(1:3, c(5, 4, 5)))
rank_epsilon_squared(x, g)

wb <- aggregate(warpbreaks$breaks,
  by = list(
    w = warpbreaks$wool,
    t = warpbreaks$tension
  ),
  FUN = mean
)
kendalls_w(x ~ w | t, data = wb)


[Package effectsize version 0.4.5.1 Index]
```
bwiernik commented 3 years ago

I isolated the cause of the HTML viewer issue, but I am still not able to get the help files to build correctly:

This is what is in my .Rd file: (\Sexpr[results=rd, stage=build]{katex::math_to_rd("\\\\delta", 'delta', FALSE)})

When I build, it becomes this in the HTML:

(
<link rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/npm/katex@0.13.11/dist/katex.min.css">
<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>δ</mi></mrow><annotation encoding="application/x-tex">\delta</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord mathnormal" style="margin-right:0.03785em;">δ</span></span></span></span>

)
jeroen commented 3 years ago

I can see this as well now. It does not happen when you install the package using install.packages(), it only happens when you build the source package on Windows.

I think this is a bug in R, where it treats the output from the \Sexpr{} incorrectly on Windows. I've asked advice from Tomas Kalibera.

jeroen commented 3 years ago

The Windows bug should be fixed in R 4.1.1. I added a workaround to this package for lower versions. Could you test it?

bwiernik commented 3 years ago

Thanks!

On R 4.1.0, that works as an acceptable workaround for characters that produce an ASCII analogue character in enc2native() (such as α, β, δ, ε, τ), but for characters that produce Unicode escape sequence like <U+03C1> (ρ) or <U+0001F600> (😀), the unescaped < > characters stops the HTML output from being rendered afterword.

That can be fixed by further postprocessing with this regex:

sub(pattern = "<(U\\+[0-9A-Fa-f]{4,8})>", replacement = "&lt;\\1&gt;", x = rd)

eg:

sub(pattern = "<(U\\+[0-9A-Fa-f]{4,8})>", replacement = "&lt;\\1&gt;", 
    x = c("<U+03C1>", "<U+0001F600>"))
#> [1] "&lt;U+03C1&gt;"     "&lt;U+0001F600&gt;"

Created on 2021-07-15 by the reprex package (v2.0.0)

Example of the issue:

#' Compute the rank-biserial correlation
#' (\Sexpr[results=rd, stage=build]{katex::math_to_rd('r_{rb}', 'r_rb', FALSE)}),
#' Cliff's *delta* (\Sexpr[results=rd, stage=build]{katex::math_to_rd("\\\\delta", 'delta', FALSE)}),
#' rank epsilon squared (\Sexpr[results=rd, stage=build]{katex::math_to_rd('\\\\rho^2', 'epsilon^2', FALSE)}), and
#' Kendall's \eqn{W} effect sizes for non-parametric (rank sum) tests.

image

jeroen commented 3 years ago

Smart, can you send a PR?