r-lib / lintr

Static Code Analysis for R
https://lintr.r-lib.org
Other
1.2k stars 187 forks source link

Non-ASCII Variable Names Cause Errors in VSCode with languageserver Using Tidyverse NSE #2670

Closed GohUnTsuan closed 1 day ago

GohUnTsuan commented 1 week ago

I am encountering an issue when using non-ASCII characters as variable names in VSCode, combined with the languageserver environment. Specifically, the problem arises under the non-standard evaluation (NSE) format of the tidyverse, causing lintr to mistakenly flag syntactically correct code as erroneous.

Issue Description

When using non-ASCII characters in variable names and applying tidyverse functions such as mutate, the lintr package reports an unexpected error. For example, the following code snippet should run without any issues but lintr flags an error:

dat_lm_city <- dat_lm_city |> mutate( 供给型_bi = ifelse(供给型 > 0, 1, 0) )

image

The error reported is unexpected '<' lintr(error), which incorrectly suggests a syntax error. This false positive prevents the languageserver from properly parsing subsequent RMarkdown content.

Steps to Reproduce

1.  Use a non-ASCII variable name in a tidyverse NSE expression within an R script in VSCode with the languageserver enabled.
2.  Observe the linting error despite the R code executing correctly.

Expected Behavior

lintr should not flag syntactically correct code as erroneous, particularly when non-ASCII characters are valid within variable names in R and supported by the tidyverse NSE format.

Actual Behavior

lintr incorrectly reports a syntax error, leading to issues in RMarkdown parsing within the languageserver environment in VSCode.

AshesITR commented 1 day ago

Hi @GohUnTsuan.

This seems to be an issue with some other part of the toolchain. Your example code works fine and produces no lints when stripping the spaces around mutate:

lintr::lint(text=r"(dat_lm_city <- dat_lm_city |> mutate(供给型_bi = ifelse(供给型 > 0, 1, 0)))")

Created on 2024-10-24 with reprex v2.1.0

Your problem may be that the UTF-8 string is not representable in your systems native encoding. If you're running Windows, the following may or may not help you debug the issue: https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/ Note that due to technical limitations of R, all names (e.g. the argument name 供给型_bi in your call to mutate()) must be encoded in the system native encoding. If a character is not representable in the native encoding, it is replaced by <U+XXXX> in the conversion, which would render your code invalid and produce the error message you see.