r-lib / styler

Non-invasive pretty printing of R code
https://styler.r-lib.org
Other
703 stars 70 forks source link

En dashes #1179

Closed swo closed 4 months ago

swo commented 4 months ago

We're cleaning a file that uses en dashes when it should use hyphens. But styler raises errors when styling en dashes:

> styler::style_text("foo–bar")
Error in `parse_safely()`:
! <text>:1:4: unexpected input
1: foo–
       ^
Run `rlang::last_trace()` to see where the error occurred.
> styler::style_text("foo\u2013bar")
Error in `parse_safely()`:
! <text>:1:4: unexpected input
1: foo–
       ^
Run `rlang::last_trace()` to see where the error occurred.
MichaelChirico commented 4 months ago

this doesn't parse in the first place: parse(text="foo–bar"). I don't think there's anything styler could reasonably do for this rare case -- how did the en dashes get introduced to begin with? you're better off using a find-and-replace first before trying to style.

swo commented 4 months ago

Understood, to test styler in a one-liner, I'd need to do:

styler::style_text('x <- "foo\u2013bar"')
styler::style_text('x<-"foo–bar"')

both of which give warnings.

We were trying to read in fixed data files that use a combination of hyphens and en dashes in dates ("2023-2024" vs. "2023–2024") and get them into a standard format before downstream processing. So we need to reference the idea of an en dash in the code, to do the text replacement there.

We had some trouble that styler was replacing the en dash with <U+2013>; let me see if we can use the \u2013 approach to resolve that.

MichaelChirico commented 4 months ago

hmm in your original issue the en dash is parsed as R code, but not in your follow up (it's part of a string).

Note the difference:

style_text("foo–bar")
style_text('"foo–bar"')

can you clarify which is affecting you?

swo commented 4 months ago

We have a problem that code like fixed_string <- str_replace(original_string, "–", "-") was getting styled to something like str_replace(original_string, "<U+2013>", "-").

We found a way around this by writing str_replace(original_string, "\u2013", "-"), which is probably better for readability anyway.

If I can come up with a portable example, I'll reopen.

Thanks for your help!