Closed renkun-ken closed 3 years ago
Unfortunately this seems like a base R bug:
> code <- 'x <- r"(hello, "world")"'
> getParseData(parse(text = code))
line1 col1 line2 col2 id parent token terminal text
7 1 1 1 24 7 0 expr FALSE
1 1 1 1 1 1 3 SYMBOL TRUE x
3 1 1 1 1 3 7 expr FALSE
2 1 3 1 4 2 7 LEFT_ASSIGN TRUE <-
4 1 6 1 24 4 6 STR_CONST TRUE "hello, "world")
6 1 6 1 24 6 7 expr FALSE
Unlikely that they would fix this for R 4.0, maybe we can work around it. I'll post on R-devel, nevertheless.
OK, posted R-devel. As for workarounds, if they don't fix it, then we can work around it using the positions. col1
and col2
seem to include whole raw string expression.
Just saw https://stat.ethz.ch/pipermail/r-devel/2020-April/079369.html. Thanks!
For downstream usage, we could still trim the first and last characters to extract the string literal.
I am not sure if that's good, actually:
> getParseData(parse(text = '"xx\\"xx"'))
line1 col1 line2 col2 id parent token terminal text
1 1 1 1 8 1 3 STR_CONST TRUE "xx\\"xx"
3 1 1 1 8 3 0 expr FALSE
> getParseData(parse(text = 'r"(xx"xx)"'))
line1 col1 line2 col2 id parent token terminal text
1 1 1 1 10 1 3 STR_CONST TRUE "xx"xx)
3 1 1 1 10 3 0 expr FALSE
In the first case the escaping is kept, so that's a proper string literal, in the second case it is not a string literal if we remove the delimiters.
So I think we should maybe do the opposite and always keep the delimiters? Then we could have a proper string literal it seems.
We can add some info to the XML that this is a raw string, although you can also detect that by looking at the first character. It that's a r
then it is a raw string.
I am afraid that the language tools like lintr etc. will need explicit support for raw strings.
Yes, I already raised an issue (https://github.com/jimhester/lintr/issues/484) at lintr about raw string support.
Fixed now in R-devel, FYI: https://github.com/wch/r-source/commit/992d9d33e8a87927f00369b628a54a3bc19cab29
We should still have a workaround for R 4.0.0, though.
Thanks!
So, this was fixed in R 4.0.1, only R 4.0.0 was broken. I don't think we necessarily need a workaround in R 4.0.0. We could give a warning, though. @renkun-ken Would you like a warning for this?
In my use case in languageserver to detect file paths in user code, we only use the line and col attributes rather than text()
of those STR_CONST
nodes. Therefore, I don't really need a warning for this.
R 4.0 introduces raw string in the form of
r"(hello, "world")"
(see https://stat.ethz.ch/R-manual/R-devel/library/base/html/Quotes.html).For the following code:
xmlparsedata::xml_parse_data()
produces the following XML output (formatted):It looks like the value of the
STR_CONST
node is not consistent to work with because it omits the opening(
and closing"
but keeps closing)
.