r-lib / xmlparsedata

R code parse data as an XML tree
https://r-lib.github.io/xmlparsedata/
Other
23 stars 7 forks source link

R 4.0 raw string support #10

Closed renkun-ken closed 3 years ago

renkun-ken commented 4 years ago

R 4.0 introduces raw string in the form of r"(hello, "world")" (see https://stat.ethz.ch/R-manual/R-devel/library/base/html/Quotes.html).

For the following code:

x <- r"(hello, "world")"

xmlparsedata::xml_parse_data() produces the following XML output (formatted):

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<exprlist>
  <expr line1="1" col1="1" line2="1" col2="24" start="26" end="49">
    <expr line1="1" col1="1" line2="1" col2="1" start="26" end="26">
      <SYMBOL line1="1" col1="1" line2="1" col2="1" start="26" end="26">x</SYMBOL>
    </expr>
    <LEFT_ASSIGN line1="1" col1="3" line2="1" col2="4" start="28" end="29">&lt;-</LEFT_ASSIGN>
    <expr line1="1" col1="6" line2="1" col2="24" start="31" end="49">
      <STR_CONST line1="1" col1="6" line2="1" col2="24" start="31" end="49">"hello, "world")</STR_CONST>
    </expr>
  </expr>
</exprlist>

It looks like the value of the STR_CONST node is not consistent to work with because it omits the opening ( and closing " but keeps closing ).

gaborcsardi commented 4 years ago

Unfortunately this seems like a base R bug:

> code <- 'x <- r"(hello, "world")"'
> getParseData(parse(text = code))
  line1 col1 line2 col2 id parent       token terminal             text
7     1    1     1   24  7      0        expr    FALSE
1     1    1     1    1  1      3      SYMBOL     TRUE                x
3     1    1     1    1  3      7        expr    FALSE
2     1    3     1    4  2      7 LEFT_ASSIGN     TRUE               <-
4     1    6     1   24  4      6   STR_CONST     TRUE "hello, "world")
6     1    6     1   24  6      7        expr    FALSE

Unlikely that they would fix this for R 4.0, maybe we can work around it. I'll post on R-devel, nevertheless.

gaborcsardi commented 4 years ago

OK, posted R-devel. As for workarounds, if they don't fix it, then we can work around it using the positions. col1 and col2 seem to include whole raw string expression.

renkun-ken commented 4 years ago

Just saw https://stat.ethz.ch/pipermail/r-devel/2020-April/079369.html. Thanks!

For downstream usage, we could still trim the first and last characters to extract the string literal.

gaborcsardi commented 4 years ago

I am not sure if that's good, actually:

> getParseData(parse(text = '"xx\\"xx"'))
  line1 col1 line2 col2 id parent     token terminal      text
1     1    1     1    8  1      3 STR_CONST     TRUE "xx\\"xx"
3     1    1     1    8  3      0      expr    FALSE
> getParseData(parse(text = 'r"(xx"xx)"'))
  line1 col1 line2 col2 id parent     token terminal    text
1     1    1     1   10  1      3 STR_CONST     TRUE "xx"xx)
3     1    1     1   10  3      0      expr    FALSE

In the first case the escaping is kept, so that's a proper string literal, in the second case it is not a string literal if we remove the delimiters.

So I think we should maybe do the opposite and always keep the delimiters? Then we could have a proper string literal it seems.

gaborcsardi commented 4 years ago

We can add some info to the XML that this is a raw string, although you can also detect that by looking at the first character. It that's a r then it is a raw string.

I am afraid that the language tools like lintr etc. will need explicit support for raw strings.

renkun-ken commented 4 years ago

Yes, I already raised an issue (https://github.com/jimhester/lintr/issues/484) at lintr about raw string support.

gaborcsardi commented 4 years ago

Fixed now in R-devel, FYI: https://github.com/wch/r-source/commit/992d9d33e8a87927f00369b628a54a3bc19cab29

gaborcsardi commented 4 years ago

We should still have a workaround for R 4.0.0, though.

renkun-ken commented 4 years ago

Thanks!

gaborcsardi commented 3 years ago

So, this was fixed in R 4.0.1, only R 4.0.0 was broken. I don't think we necessarily need a workaround in R 4.0.0. We could give a warning, though. @renkun-ken Would you like a warning for this?

renkun-ken commented 3 years ago

In my use case in languageserver to detect file paths in user code, we only use the line and col attributes rather than text() of those STR_CONST nodes. Therefore, I don't really need a warning for this.