r-lib / xmlparsedata

R code parse data as an XML tree
https://r-lib.github.io/xmlparsedata/
Other
23 stars 7 forks source link

Support lambda syntax #17

Closed renkun-ken closed 3 years ago

renkun-ken commented 3 years ago

In the R-devel, the pipe |> and lambda \(x) x + 1 syntax are introduced. I take a look at their parse data to find if anything is broken here.

The parse data looks good but xml_parse_data produces invalid XML since \\ is a new token and is not allowed in XML node name.

> e <- quote(\(x) x + 1)                                                                                                                                                                                                            

> e                                                                                                                                                                                                                                 
function(x) x + 1

> e <- parse(text = "\\(x) x + 1")                                                                                                                                                                                                  

> pd <- getParseData(e)                                                                                                                                                                                                             

> pd                                                                                                                                                                                                                                
   line1 col1 line2 col2 id parent          token terminal text
14     1    1     1   10 14      0           expr    FALSE     
1      1    1     1    1  1     14         '\\\\'     TRUE   \\
2      1    2     1    2  2     14            '('     TRUE    (
3      1    3     1    3  3     14 SYMBOL_FORMALS     TRUE    x
4      1    4     1    4  4     14            ')'     TRUE    )
12     1    6     1   10 12     14           expr    FALSE     
6      1    6     1    6  6      8         SYMBOL     TRUE    x
8      1    6     1    6  8     12           expr    FALSE     
7      1    8     1    8  7     12            '+'     TRUE    +
9      1   10     1   10  9     10      NUM_CONST     TRUE    1
10     1   10     1   10 10     12           expr    FALSE     

> xml <- xml_parse_data(e)                                                                                                                                                                                                          

> xml2::read_xml(xml)                                                                                                                                                                                                               
Error in read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html,  : 
  StartTag: invalid element name [68]
Backtrace:
1: read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html, 
2: read_xml.character(xml)
3: xml2::read_xml(xml)

It looks like we need to extend the current xml_parse_token_map to include '\\\\'.

Since the new syntax are still experimental in r-devel and may be subject to change, I'm not sure when we should consider to support it.

gaborcsardi commented 3 years ago

Yeah, do you want to submit a PR? :)

renkun-ken commented 3 years ago

Sure. I'll submit a PR soon.