Closed dieghernan closed 3 years ago
Maybe related with https://github.com/r-lib/pkgdown/commit/b9db0360b4dbbb9a80da657099d8dd25058ac556 🤔?
It is not restricted to README, see https://ropengov.github.io/giscoR/LICENSE.html:
Version 3, 29 June 2007
Copyright © 2007 Free Software Foundation, Inc. <http://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
Preamble
The GNU General Public License is a free, copyleft license for software and other kinds of works.
The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change all versions of a programâto make sure it remains free software for all its users. We, the Free Software Foundation, use the GNU General Public License for most of our software; it applies also to any other work released this way by its authors. You can apply it to your programs, too.
...
For the developersâ and authorsâ protection, the GPL clearly explains that there is no warranty for this free software. For both usersâ and authorsâ sake, the GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions.
The error is somewhere in markdown_path_html()
:female_detective:
At
https://github.com/r-lib/pkgdown/blob/1803229326669a2734c5f9ad564a39f0012f6ded/R/markdown.R#L63
The signs are still fine.
After https://github.com/r-lib/pkgdown/blob/1803229326669a2734c5f9ad564a39f0012f6ded/R/markdown.R#L64
they are not.
Maybe the HTML should be read like in update_html()
.
Noting that update_html()
via the functions it uses, assumes UTF-8, and if instead of xml <- xml2::read_html(html_path)
above I add the same encoding then things look fine.
And it used to be the case in the Markdown transforming function.
https://github.com/r-lib/pkgdown/blob/ff06f4fb444ac4c7cc6219177d87986994026124/R/markdown.R#L27
@maelle do you minding sharing the reprex you presumably created so I can turn it into a test?
@hadley sorry, I simply cloned the repo mentioned in the report. :sweat_smile:
Hi @hadley, I prepared a reprex. It seems that moving xml2::read_html(html_path)
to xml2::read_html(html_path, encoding = "UTF-8")
as @maelle suggested may solve the issue. However encodings are tricky, and I don't have full knowledge of the implications on this (I hate encodings, by the way):
# Create markdown
tmpmd <- file.path(tempdir(), "temp.md")
file.create(tmpmd)
#> [1] TRUE
# Write my name
writeLines("Diego Hernangómez, © Eurostat", tmpmd)
text <- readLines(tmpmd)
# On markdown is ok
text
#> [1] "Diego Hernangómez, © Eurostat"
# Parse with pkgdown
# Now its wrong
pkgdown:::markdown_to_html(text)
#> {html_document}
#> <html>
#> [1] <body><p>Diego Hernangómez, © Eurostat</p></body>
# Step by step: pkgdown:::markdown_to_html
# https://github.com/r-lib/pkgdown/blob/2720abc02fbddbb761104d44d30ce7a3d0c26812/R/markdown.R#L84-L96
# markdown_to_html <- function(text, dedent = 4) {
# if (dedent) {
# text <- gsub(paste0("($|\n)", strrep(" ", dedent)), "\\1", text, perl = TRUE)
# }
#
# md_path <- withr::local_tempfile()
# html_path <- withr::local_tempfile()
#
# write_lines(text, md_path)
# convert_markdown_to_html(md_path, html_path)
#
# xml2::read_html(html_path)
# }
dedent <- 4
# markdown_to_html <- function(text, dedent = 4) {
# Error here! dedent is integer, not logical
# if (dedent) {
text <- gsub(paste0("($|\n)", strrep(" ", dedent)), "\\1", text, perl = TRUE)
# }
text
#> [1] "Diego Hernangómez, © Eurostat"
md_path <- withr::local_tempfile()
#> Setting deferred event(s) on global environment.
#> * Execute (and clear) with `withr::deferred_run()`.
#> * Clear (without executing) with `withr::deferred_clear()`.
html_path <- withr::local_tempfile()
pkgdown:::write_lines(text, md_path)
pkgdown:::convert_markdown_to_html(md_path, html_path)
readLines(html_path)
#> [1] "<p>Diego Hernangómez, © Eurostat</p>"
xml2::read_html(html_path)
#> {html_document}
#> <html>
#> [1] <body><p>Diego Hernangómez, © Eurostat</p></body>
# And moving to this as Maelle suggested is ok
xml2::read_html(html_path, encoding = "UTF-8")
#> {html_document}
#> <html>
#> [1] <body><p>Diego Hernangómez, © Eurostat</p></body>
# }
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19042)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
#> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
#> [5] LC_TIME=Spanish_Spain.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] xml2_1.3.2 knitr_1.36 magrittr_2.0.1
#> [4] R.cache_0.15.0 rlang_0.4.11 fastmap_1.1.0
#> [7] fansi_0.5.0 stringr_1.4.0 styler_1.6.2
#> [10] highr_0.9 tools_4.1.0 xfun_0.26
#> [13] R.oo_1.24.0 utf8_1.2.2 withr_2.4.2
#> [16] htmltools_0.5.2 ellipsis_0.3.2 yaml_2.2.1
#> [19] digest_0.6.28 tibble_3.1.4 lifecycle_1.0.1
#> [22] pkgdown_1.9000.9000.9000 crayon_1.4.1 purrr_0.3.4
#> [25] R.utils_2.11.0 vctrs_0.3.8 fs_1.5.0
#> [28] cachem_1.0.6 memoise_2.0.0 glue_1.4.2
#> [31] evaluate_0.14 rmarkdown_2.11 reprex_2.0.1
#> [34] stringi_1.7.4 compiler_4.1.0 pillar_1.6.3
#> [37] backports_1.2.1 R.methodsS3_1.8.1 pkgconfig_2.0.3
Created on 2021-10-04 by the reprex package (v2.0.1)
Hi,
Since the upgrade to v 1.9000.9000.9000 I see an issue on homepage (parsed from README):
On previous versions it was "Hernangómez", without escaping. Also other characters as "©" now are parsed as "©".
https://ropengov.github.io/giscoR/
On v 1.6.1.9001. this was parsed correctly, see https://dieghernan.github.io/nominatimlite/