r-lib / styler

Non-invasive pretty printing of R code
https://styler.r-lib.org
Other
724 stars 71 forks source link

Regression of re-styling of unicode characters #1199

Closed cicdguy closed 5 months ago

cicdguy commented 5 months ago

Hello,

I believe we are seeing a regression of https://github.com/r-lib/styler/issues/847 in R 4.4.0 on Linux. MacOS does not have this issue, and I haven't tested it on Windows.

Steps to reproduce

Start a shell using the rocker/verse:4.4.0 image

docker run -it --rm --platform=linux/amd64 rocker/verse:4.4.0 sh

Observe the OS version

cat /etc/os-release

Install styler from CRAN

R -s -e 'install.packages("styler", repos = "https://cloud.r-project.org", quiet = T, Ncpus = 8)'

Create a simple file containing unicode characters

echo 'a <- "R² μ ≥"' > ex.R

Style the file

R -s -e 'styler::style_file("ex.R")'

Observe the re-styled file

cat ex.R

Supplemental Information

Running utils::getParseData(parse(text = 'suit <- "♠"')) in the container gives me:

  line1 col1 line2 col2 id parent       token terminal           text
7     1    1     1   13  7      0        expr    FALSE
1     1    1     1    4  1      3      SYMBOL     TRUE           suit
3     1    1     1    4  3      7        expr    FALSE
2     1    6     1    7  2      7 LEFT_ASSIGN     TRUE             <-
4     1    9     1   13  4      6   STR_CONST     TRUE "\342\231\240"
6     1    9     1   13  6      7        expr    FALSE

But running on my MacOS laptop gives me:

  line1 col1 line2 col2 id parent       token terminal text
7     1    1     1   11  7      0        expr    FALSE
1     1    1     1    4  1      3      SYMBOL     TRUE suit
3     1    1     1    4  3      7        expr    FALSE
2     1    6     1    7  2      7 LEFT_ASSIGN     TRUE   <-
4     1    9     1   11  4      6   STR_CONST     TRUE  "♠"
6     1    9     1   11  6      7        expr    FALSE
lorenzwalthert commented 5 months ago

Thanks for the good repro. If I am not mistaken, utils::getParseData() is the problem here and styler has nothing to do with it? @IndrajeetPatil maybe you can jump in.

cicdguy commented 5 months ago

Yes indeed. This is likely an R-related issue as seen previously as well.

Seeking advice here - is there some way styler can somehow ignore Unicode characters?

IndrajeetPatil commented 5 months ago

I can check this tomorrow on my Ubuntu machine, but it is a bit strange that, if indeed there has been this regression in R >= 4.4, the encoding test doesn't fail either on release or devel version: https://github.com/r-lib/styler/pull/1200.

IndrajeetPatil commented 5 months ago

I can't reproduce this locally on Ubuntu either.

Here is a reprex with session info:

utils::getParseData(parse(text = 'suit <- "♠"'))
#>   line1 col1 line2 col2 id parent       token terminal text
#> 7     1    1     1   11  7      0        expr    FALSE    
#> 1     1    1     1    4  1      3      SYMBOL     TRUE suit
#> 3     1    1     1    4  3      7        expr    FALSE    
#> 2     1    6     1    7  2      7 LEFT_ASSIGN     TRUE   <-
#> 4     1    9     1   11  4      6   STR_CONST     TRUE  "♠"
#> 6     1    9     1   11  6      7        expr    FALSE

Created on 2024-05-06 with reprex v2.1.0

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.4.0 (2024-04-24) #> os Ubuntu 22.04.4 LTS #> system x86_64, linux-gnu #> ui X11 #> language en_US.UTF-8 #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/Berlin #> date 2024-05-06 #> pandoc 3.1.11 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/x86_64/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> ! package * version date (UTC) lib source #> P cli 3.6.2 2023-12-11 [?] RSPM (R 4.4.0) #> P digest 0.6.35 2024-03-11 [?] RSPM (R 4.4.0) #> P evaluate 0.23 2023-11-01 [?] RSPM (R 4.4.0) #> P fastmap 1.1.1 2023-02-24 [?] RSPM (R 4.4.0) #> P fs 1.6.4 2024-04-25 [?] RSPM #> P glue 1.7.0 2024-01-09 [?] RSPM (R 4.4.0) #> htmltools 0.5.8.1 2024-04-04 [1] RSPM (R 4.4.0) #> knitr 1.46 2024-04-06 [1] RSPM (R 4.4.0) #> P lifecycle 1.0.4 2023-11-07 [?] RSPM (R 4.4.0) #> P magrittr 2.0.3 2022-03-30 [?] RSPM (R 4.4.0) #> P purrr 1.0.2 2023-08-10 [?] RSPM (R 4.4.0) #> P R.cache 0.16.0 2022-07-21 [?] RSPM (R 4.4.0) #> P R.methodsS3 1.8.2 2022-06-13 [?] RSPM (R 4.4.0) #> P R.oo 1.26.0 2024-01-24 [?] RSPM (R 4.4.0) #> P R.utils 2.12.3 2023-11-18 [?] RSPM (R 4.4.0) #> P reprex 2.1.0 2024-01-11 [?] RSPM #> P rlang 1.1.3 2024-01-10 [?] RSPM (R 4.4.0) #> P rmarkdown 2.26 2024-03-05 [?] RSPM (R 4.4.0) #> P rstudioapi 0.16.0 2024-03-24 [?] RSPM #> P sessioninfo 1.2.2 2021-12-06 [?] RSPM (R 4.4.0) #> styler 1.10.3.9000 2024-05-06 [1] Github (r-lib/styler@4b24ff6) #> P vctrs 0.6.5 2023-12-01 [?] RSPM (R 4.4.0) #> P withr 3.0.0 2024-01-16 [?] RSPM (R 4.4.0) #> P xfun 0.43 2024-03-25 [?] RSPM (R 4.4.0) #> P yaml 2.3.8 2023-12-11 [?] RSPM (R 4.4.0) #> #> [1] /home/indra/.cache/R/renv/library/enetpipeline-ebbe6db5/linux-ubuntu-jammy/R-4.4/x86_64-pc-linux-gnu #> [2] /home/indra/.cache/R/renv/sandbox/linux-ubuntu-jammy/R-4.4/x86_64-pc-linux-gnu/9a444a72 #> [3] /usr/lib/R/library #> #> P ── Loaded and on-disk path mismatch. #> #> ────────────────────────────────────────────────────────────────────────────── ```

@cicdguy Can you please post a reprex with session info so we can check what's different between our/GitHub and your machines?

IndrajeetPatil commented 5 months ago

Hmm, I can reproduce the output you are seeing in the container:

# echo 'a <- "R² μ ≥"' > ex.R
# R -s -e 'styler::style_file("ex.R")'
Styling  1  files:
 ex.R i 
----------------------------------------
Status  Count   Legend 
v   0   File unchanged.
i   1   File changed.
x   0   Styling threw an error.
----------------------------------------
Please review the changes carefully!
# cat ex.R
a <- "R<U+00B2> <U+03BC> <U+2265>"

Can this be an issue in Rocker's image? Can anyone reproduce this without using this image?

IndrajeetPatil commented 5 months ago

@eitsupi Maybe you have some idea as to what might be going on here?

eitsupi commented 5 months ago

Perhaps it is a locale issue? See rocker-org/rocker-versioned2#802. Try setting the environment variable LANG=en_US.UTF-8.

cicdguy commented 5 months ago

It is indeed a locale issue. Setting LANG=en_US.UTF-8 works like a charm. I guess I'll just set this on the containers going forward. Thank you all! 🙏🏽

eitsupi commented 5 months ago

Sorry for bothering you. I have triggered a new build, so will fix this.

IndrajeetPatil commented 5 months ago

Thanks for the quick reply and fix, @eitsupi. You Rock(er)! 🤘

lorenzwalthert commented 5 months ago

Ok, but it’s still a problem in base r and setting the locale is more of a workaround, no?

IndrajeetPatil commented 5 months ago

Ok, but it’s still a problem in base r and setting the locale is more of a workaround, no?

No, it was an issue with the Rocker image of base-R, not in the base-R itself. This is why the issue was reproducible neither locally nor on GitHub, but only in Docker containers using the said image. But the image has already been fixed, so this is no longer an issue.

lorenzwalthert commented 5 months ago

So parsing a <- "R² μ ≥" is expected to give something meaningful only if LANG is set?