k0kubun / hamlit

High Performance Haml Implementation
https://rubygems.org/gems/hamlit
Other
980 stars 60 forks source link

`\0` as safe delimiter causing issues when passed through ruby_parser #148

Closed spohlenz closed 4 years ago

spohlenz commented 4 years ago

I've been having some issues with hamlit 2.10.0 in conjunction with ruby_parser, which I believe can be traced back to this commit: f706c0bfb28d677614a83d4c4c8ea07daa7b97c5 (although the relevant code now lives here).

The error I receive within ruby_parser when parsing my compiled Haml code is:

#<RubyParser::SyntaxError: unterminated string meets end of file. near line 5: "">

which corresponds to something like this excerpt of compiled Hamlit code:

_hamlit_compiler1 = ( render \"partial\";
); @output_buffer.safe_concat(((%Q\u0000\#{::Hamlit::Utils.escape_html_safe(((_hamlit_compiler1).to_s))}
\u0000).to_s));

However I'm not sure that the use of \0 as a safe interpolation delimiter (introduced in f706c0bfb28d677614a83d4c4c8ea07daa7b97c5) is a syntactically valid choice in Ruby. Consider the following simplified snippet within irb:

2.6.5 :001 > %Q[#{123}] # Old delimiter
 => "123" 

2.6.5 :002 > %Q\0#{123}\0 # New delimiter
SyntaxError ((irb):2: syntax error, unexpected tINTEGER, expecting end-of-input)
%Q\0#{123}\0

which leads me to believe that the usage is invalid, although it puzzles me why it works within the regular usage of Hamlit.

I do understand why this change was made, but I wonder if there are characters other than \0 that could be used as the delimiter?

k0kubun commented 4 years ago

JFYI: Originally I just stole the idea from https://github.com/amatsuda/string_template/blob/v0.2.1/lib/string_template/handler.rb#L6.

I've been having some issues with hamlit 2.10.0 in conjunction with ruby_parser

Because the generated code works inside Ruby, Ruby's native parser seems to be parsing it correctly. Therefore it rather seems to be an incompatibility in the third-party Ruby parser, ruby_parser.gem.

I'd fix the behavior of Hamlit if Ruby's native parser doesn't work with it, but in my opinion the actual problem lives in ruby_parser.gem in this case. Could you file an issue to https://github.com/seattlerb/ruby_parser too?

I do understand why this change was made, but I wonder if there are characters other than \0 that could be used as the delimiter?

Theoretically, any other valid character may conflict with original valid Ruby programs because it's valid. We could randomly find a character unused in the template, but it may slow down the compilation a little and complicate the implementation. Another approach would be performing #merge_dynamic only when all embedded scripts are complete expressions. That also pressures compilation time.

A simple workaround would be providing an option to disable this DynamicMerger filter (believe me, still it should be as fast as v2.9). Do you want this one to fix your immediate problem? If so, I can work on that.

spohlenz commented 4 years ago

Thanks for the quick response @k0kubun.

I'd initially also thought the issue was in ruby_parser (and I will likely still open an issue there in case it is). However my experiments with the delimiter in plain irb had me considering otherwise.

A simple workaround would be providing an option to disable this DynamicMerger filter (believe me, still it should be as fast as v2.9). Do you want this one to fix your immediate problem? If so, I can work on that.

I've pinned to ~> 2.9.5 in my Gemfile for now so I'm not in a huge rush. I'd rather see it fixed in the right place, wherever that may be.

k0kubun commented 4 years ago

However my experiments with the delimiter in plain irb had me considering otherwise.

Ah, I overlooked that how you checked it on irb.

Let's look at this:

2.6.5 :002 > %Q\0#{123}\0 # New delimiter

How did you enter \0? It should be not "backslash + 0" but a single NUL character, which you usually can't type (I heard some console supports it, but with some special combination of keys).

Therefore you need to test this like (shown as %Q#{123} by irb is correct; because it's invisible special character):

$ ruby -e 'puts "%Q\0\#{123}\0"' | RUBYOPT="-v" irb
ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-darwin18]
Switch to inspect mode.
%Q#{123}
"123"

and what you actually did seems to be:

$ ruby -e 'puts "%Q\\0\#{123}\\0"' | RUBYOPT="-v" irb
ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-darwin18]
Switch to inspect mode.
%Q\0#{123}\0
Traceback (most recent call last):
        3: from /Users/k0kubun/.rbenv/versions/2.6.3/bin/irb:23:in `<main>'
        2: from /Users/k0kubun/.rbenv/versions/2.6.3/bin/irb:23:in `load'
        1: from /Users/k0kubun/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/irb-1.0.0/exe/irb:11:in `<top (required)>'
SyntaxError ((irb):1: syntax error, unexpected tINTEGER, expecting end-of-input)
%Q\0#{123}\0
           ^

Besides, with Ruby 2.6.3 and ruby_parser.gem 3.14.0,

$ irb -rruby_parser
irb(main):001:0> RubyVM::AbstractSyntaxTree.parse("%Q\0\#{123}\0")
=> #<RubyVM::AbstractSyntaxTree::Node:SCOPE@1:0-1:10>
irb(main):002:0> RubyParser.new.parse("%Q\0\#{123}\0")
=> s(:dstr, "", s(:evstr, s(:lit, 123)))
irb(main):003:0>

\0 delimiter can be parsed successfully. Thus I'm not sure what part is broken right now.

Could you prepare a small script which uses only hamlit and ruby_parser and generates the error #<RubyParser::SyntaxError: unterminated string meets end of file. near line 5: "">?

spohlenz commented 4 years ago

Here's a script that should reproduce the issue.

require 'bundler/inline'

gemfile do
  source 'https://rubygems.org'

  gem 'hamlit', '2.10.0'
  # gem 'hamlit', '2.9.5'

  gem 'ruby_parser', '3.14.0'
end

TEMPLATE = """
= @foo

= @bar
"""

src = Hamlit::Template.new { TEMPLATE }.precompiled_template

puts "Compiled template:\n\n"
puts src
puts "\n"

begin
  parser = RubyParser.new
  parser.process(src)
rescue => e
  puts "ruby_parser: FAILURE!"
  puts e.inspect
else
  puts "ruby_parser: SUCCESS!"
end

On my machine, this gives the following output:

$ ruby hamlit-ruby_parser.rb 
Compiled template:

extend Hamlit::Helpers; _buf = []; 
; _hamlit_compiler1 = ( @foo; 
; ); _buf << (%Q#{::Hamlit::Utils.escape_html(((_hamlit_compiler1).to_s))}
); _hamlit_compiler2 = ( @bar; 
; ); _buf << (::Hamlit::Utils.escape_html(((_hamlit_compiler2).to_s))); _buf << ("\n".freeze); _buf = _buf.join("".freeze)

ruby_parser: FAILURE!
#<RubyParser::SyntaxError: unterminated string meets end of file. near line 3: "">

The empty newline in between the two lines of the template appear to be critical in producing the error.

Thank you also for the info on \0. TIL. Hopefully the reproduction above will be enough for you to at least tell which project the bug lies in.

spohlenz commented 4 years ago

One further note: since calling eval(src) works correctly, I tend to agree with you that the issue is with seattlerb/ruby_parser so I will go ahead and create an issue there.

k0kubun commented 4 years ago

Your simplified report on ruby_parser.gem looks great, and matched my understanding from your reproduction code. Thanks for your help!

zenspider commented 4 years ago

I hate both of you. 😛