seattlerb / ruby_parser

ruby_parser is a ruby parser written in pure ruby. It outputs s-expressions which can be manipulated and converted back to ruby via the ruby2ruby gem.
http://www.zenspider.com/projects/ruby_parser.html
476 stars 100 forks source link

Various complicated syntax deviations #344

Open kddnewton opened 9 months ago

kddnewton commented 9 months ago

πŸ‘‹ Hey there! In writing the prism translator, I've come across a couple of things that look like they might be bugs in ruby_parser. I wanted to flag them in case you wanted to fix them. I don't imagine they're common at all. They're listed below.

These raise errors:

alias %s[abc] %s[xyz]

foo class Bar baz do end end
foo module Bar baz do end end

def f x:-a; end
def f x:+a; end
def foo x:%(xx); end

while class Foo; tap do end; end; break; end
while class Foo a = tap do end; end; break; end
while class << self; tap do end; end; break; end
while class << self; a = tap do end; end; break; end

class if true; Object end::Kernel; end
class while true; break Object end::Kernel; end

not()

These should be flip-flops:

not foo .. bar
not (foo .. bar)
!(foo .. bar)

This should be a match node:

!/wat/

These should be arrays not array patterns:

1 in %w[]
1 in %i[]
1 in %W[]
1 in %I[]

These should introduce local variables:

/(?<match>bar)/ =~ 'bar'; match

[1, 2] => a, b; a
[1, 2] in a, b; a

{a: 1} => a:; a
{a: 1} in a:; a

{key: :value} => key: value; value
{key: :value} in key: value; value

1 => [a]; a
1 in [a]; a

This one I'm a little unsure of, but I think that the following should be s(:and, s(:and, s(:true), s(:call, s(:false), :!)), s(:true)) and not s(:and, s(:true), s(:and, s(:call, s(:false), :!), s(:true))):

true and
not false and
true

This one is dropping a value from the rescued expression:

foo, bar = 1, 2 rescue nil

This heredoc is counting an escaped newline as if it were a real newline:

p <<~"E"
  x\n   y
E

This heredoc should have not dedent:

<<~EOF
  a
#{1}
  a
EOF

This heredoc should only dedent by 2:

p <<~"E"
    x
  #{"  y"}
E

This should be a constant declaration, not a call:

foo::A += m foo

These should have UTF-8 as the encoding of the string, not ASCII-8BIT:

s = <<eos
a\xE9b
eos

s = <<-EOS
a\247b
cΓΆd
EOS

This heredoc should be beforeafter\r\n not before\nafter\n (there's a \r\n after the ):

str = <<-XXX
before\
after
XXX

These are all horrible heredoc behavior that I'm sorry to even bring up:

<<A+%
A

<<A+%r
A

<<A+%q
A

<<A+%Q
A

<<A+%s
A

<<A+%x
A

pp <<-A.gsub(/b\
a
A
b/, "")

pp <<-A, "d\
c
A
d"

pp <<-A, %q[f\
e
A
f]

pp <<-A, %Q[h\
g
A
h]

pp <<-A, %w[j\
i
A
j]

pp <<-A, %W[l\
k
A
l]

pp <<-A, %i[n\
m
A
n]

pp <<-A, %I[p\
o
A
p]

<<A; /\
A
(?<a>)/ =~ ''

<<A; :'a
A
b'

<<A; :"a
A
b"

This has a line number that is very odd (the xstring is saying it's starting on line 2, maybe the \\ isn't incrementing a line count?):

:"a\\
b"
`  x
#{foo}
#`

And then just to mention, I basically turned off support for numbered parameters, e.g., marking the following as a call instead of a local variable read:

-> { _1 }
presidentbeef commented 4 months ago

Very curious what the two of you think of the ordering of nodes for nested ors and ands (e.g. 1 or 2 or 3).

As noted above, RP puts the shallower nesting on the left, while Prism puts the deeper nesting on the left. I don't think this makes a difference in interpretation (a depth-first, left-to-right visitor will process nodes in the same order). But the s-exps are different, which means for Brakeman the "fingerprints" for warnings will be different.

So should Prism match what RP does? Or should RP change?

kddnewton commented 4 months ago

For that, I'm okay changing Prism to match the RP AST when it does the translation, but I will keep the AST to have the same structure for other use cases (it matches the ASTs for parse.y and whitequark/parser).

presidentbeef commented 4 months ago

Cool - I took a cut at it last night but didn't quite get it there πŸ˜„