Different behavior code parser to identical AST tree (array map)

adam429 commented 2 years ago

If you run code1 and code2 in ruby, it get different result.

But when parser to AST tree, the code1 AST tree is identical to code 2 AST tree.

require 'parser/current'

code1 = "[[1,2],[3,4]].map {|x| x}"
puts eval(code1) == [[1,2],[3,4]]   #==> true

code2 = "[[1,2],[3,4]].map {|x,| x}"
puts eval(code2) == [1,3]   # ==> true

ast1 = Parser::CurrentRuby.parse(code1)
ast2 = Parser::CurrentRuby.parse(code2)

puts ast1 == ast2  #=> true

puts ast1
# s(:block,
#     s(:send,
#       s(:array,
#         s(:array,
#           s(:int, 1),
#           s(:int, 2)),
#         s(:array,
#           s(:int, 3),
#           s(:int, 4))), :map),
#     s(:args,
#       s(:arg, :x)),
#     s(:lvar, :x))

puts ast2
# s(:block,
#     s(:send,
#       s(:array,
#         s(:array,
#           s(:int, 1),
#           s(:int, 2)),
#         s(:array,
#           s(:int, 3),
#           s(:int, 4))), :map),
#     s(:args,
#       s(:arg, :x)),
#     s(:lvar, :x))

iliabylich commented 2 years ago

Yes, you probably want to "modernize" an AST builder:

Parser::Builders::Default.modernize

Or (more specifically) enable a single flag:

Parser::Builders::Default.emit_procarg0 = true

I'd strongly recommend enabling all of them unless you have a huge codebase that relies on the "old" format of the AST.

mbj commented 2 years ago

@iliabylich I see a lot of people doing this mistake. I wounder if we should do a breaking change to "always" emit the newest ast and making the legacy ones an opt-in.

Yes this should have some transition period / warnings / deprecations. But this is the number 1 mistake I find in all AST tooling I review.

iliabylich commented 2 years ago

I agree, but that would be a breaking change that current versioning system doesn't support.

Do you see a way to automatically enable "modern" mode and keep it compatible? Adding another class (ModernRubyXY?) or an extra option to a Parser class doesn't change the whole picture, existing users don't pass any extra options or flags anyway.

Also, compatibility flags is the only way to keep AST format compatible and be capable of emitting a new AST at the same time if needed. Sometimes we have to introduce breaking changes because of existing bugs or misunderstanding of Ruby (and this procarg0 node is a good example). We will have to do it again in the future and so we have to either change versioning to semver (which is a HUGE breaking change) or keep it always-backward-compatible and add new flags (which returns us back to the original problem).

I agree that there's a problem but honestly I don't know how to fix it without breaking everything first.

mbj commented 2 years ago

I see the constraints and happy I have resonance.

I like the idea of a new class, but: On each new ruby version we are adding new classes already? So what if we where to decide: Once we add a new ruby version class, that class will opt into what ever is the modern flags that time.

People are used to the fact that new ruby versions break their AST tooling, so this may not be a too hard stretch?

If that one does not work, I think we should go the route of a separate class namespace.

mbj commented 2 years ago

If we go with an entirely new class naming convention, this one should also be the one we mention first in the documentation, so new users never get steered to the legacy ones.

iliabylich commented 2 years ago

Yes, but that means that we are allowed to change existing ModernRubyXY classes in a minor/patch gem version update (by auto-enabling new compatibility flags). Without switching to a proper semver it will be a mess, and that's exactly what we can't afford:

many libs/apps depend on gem 'parser' without specifying any specific version (because we promise no breaking changes)
different gems will start having conflicts fighting for a version of parser they depend on. IIRC Ruby doesn't support require-ing multiple versions of the same gem.

So I see a dead end on both sides:

switching to semver breaks everything
new classes with "modern" (and thus, version-dependent) defaults don't make sense without semver

@bbatsov @marcandre @palkan Do you guys have any thoughts?

mbj commented 2 years ago

What for the middle ground: Keep ModernRubyXY backwards compatible, but: Default to modern as it was "at the time they where released"?

marcandre commented 2 years ago

FWIW, RuboCop made specific choices on the switches; we have two turned on (emit_forward_arg and emit_match_pattern), everything else is off.

The rubocop-ast gem, which is a layer on top of parser and adds things like the node pattern is compatible with all combinations of switches.

rubocop can simply adapt to a change of API by explicitly setting the flags to false or calling a classic method if one is added.

marcandre commented 2 years ago

My personal opinion is that some of those switches are really "in retrospect, we made a mistake and we're fixing it with this switch", others are "there's a theoretical difference between these, but not a practical one" (I'm thinking of lambda, which is counter productive to turn on for linting tools like rubocop), and some are in between (like emit_proc0) that really depends on what the tool using parser wants to do.

marcandre commented 2 years ago

I should have mentioned, but rubocop would break if a user somehow where to upgrade parser but not rubocop, as gem requirement is >= some_version.

palkan commented 2 years ago

I'm not sure preventing such mistakes worth a breaking change. Instead, we can make it clearer in the documentation and, maybe, add a simpler API to modernize. We do not expect developers to start using Parser without looking at the Readme first, right?

So, right now we have:

Load Parser (see the backwards compatibility section below for explanation of emit_* calls):

require 'parser/current'
# opt-in to most recent AST format:
Parser::Builders::Default.emit_lambda              = true
Parser::Builders::Default.emit_procarg0            = true
Parser::Builders::Default.emit_encoding            = true
Parser::Builders::Default.emit_index               = true
Parser::Builders::Default.emit_arg_inside_procarg0 = true
Parser::Builders::Default.emit_forward_arg         = true
Parser::Builders::Default.emit_kwargs              = true
Parser::Builders::Default.emit_match_pattern       = true

What if we change it to something like:

Load Parser:

require 'parser/current'
require 'parser/modernize' # use the most recent AST format

Instead of using the modern AST format and requiring parser/modernize, you can configure it yourself (see the backwards compatibility section below).

dgollahon commented 2 years ago

Another option might be to emit a warning that tells you that you are not using the specific version of parser unless you tell parser ignore_modernization_warning = true or you explicitly configure emit_[some_modernization_flag_here] = false or something like that.

You'd get some warning noise, but it would at least provide visibility without completely breaking things.

whitequark / parser

Different behavior code parser to identical AST tree (array map) #858