- ruby31.y: handle local variables as hash labels with omitted values

iliabylich commented 2 years ago

There's a new rule in 3.1 that allows hash value to be omitted if it's equal to key:

$ ./miniruby -ve 'p({ rand: })'
ruby 3.1.0dev (2021-09-24T13:56:38Z master 40a65030e5) [x86_64-darwin20]
{:rand=>0.6152290626750991}

We backported it in https://github.com/whitequark/parser/pull/818, but there's one case that we missed - local variables can also be used:

./miniruby -ve 'foo = 1; p({ foo: })'
ruby 3.1.0dev (2021-09-24T13:56:38Z master 40a65030e5) [x86_64-darwin20]
{:foo=>1}

This PR amends this logic.

And of course, there are implications like circular arguments validation that was missing:

$ /bin/cat ../ruby/test.rb
def m(a = { a: })
end

$ bin/ruby-parse --31 ../ruby/test.rb
../ruby/test.rb:1:13: error: circular argument reference a
../ruby/test.rb:1: def m(a = { a: })
../ruby/test.rb:1:             ^

$ ../ruby/miniruby -cv ../ruby/test.rb
ruby 3.1.0dev (2021-09-24T13:56:38Z master 40a65030e5) [x86_64-darwin20]
../ruby/test.rb:1: circular argument reference - a

@marcandre @koic @mbj @palkan Can I get a review of this API please? Are there any issues with multiple nodes having the same source location?

palkan commented 2 years ago

Are there any issues with multiple nodes having the same source location?

As soon as it's safe to use the fact that source locations are equal for both key and value nodes to detect implicit hashes, that would work for me.

(In my implementation, I used a separate node type; but I guess that's not an option for Parser 🙂).

marcandre commented 2 years ago

Tricky. Was it considered to have the location be nil instead? The reservation I have with reusing the location (instead of nil) makes it so close to the normal form that it makes it very difficult to distinguish them. Writing a method to know if a particular lvar is of such a form, one has to use the parent (not actually available), check that it is a pair and that the lvar in question is the second child, and finally compare the location with that of the first child. This non-locality of the information doesn't seem ideal. I don't feel strongly about this, but I would intuitively think that the sym should have a location but the lvar/send/... shouldn't.

marcandre commented 2 years ago

To comment further, in RuboCop we do have access to the parent, so whichever way this ends up, we can implement a method to check this (albeit ugly). Only risk factor is that a cop would rewrite the source without realizing this new case, and instead of crashing, and the resulting generated code could be incompatible. Maybe @koic or @dvandersluis can think of other impacts for RuboCop?

iliabylich commented 2 years ago

@marcandre There's a rule that all nodes have some location (at least expression_l), I'm pretty sure that changing it would have bigger impact than sharing location between two nodes.

There's always an alternative - creating a new node type 😄

marcandre commented 2 years ago

Ah ah, new mode type sounds really bad indeed. 😬

Empty expression (just after the colon) would be an alternative.

iliabylich commented 2 years ago

Empty expression (just after the colon) would be an alternative.

That also would be a breaking change (node without expr_l). Also I don't know how I feel about "dummy" nodes in general, mostly because it splits nodes into two groups and it feels like this separation sits on top of node-type groups (i.e. this way we get two groups of types, dummy(typeA, typeB, ...) and real(typeC, typeD, ...)). I'd like to keep it flat.

I did quick research in other languages that support shorthand object field syntax.

JavaScript (Babel/parser) - the doc says that Object node has a list of members where each member is a sum type of ObjectProperty | ObjectMethod | SpreadElement where ObjectProperty has

key: Expression
shorthand: boolean;
value: Expression

The doc has T | null in many places, so I guess key and value are non-nullable, which means they share location. The flag is just a nice addition.

Rust (syn crate) - AST has a type called FieldValue that represents field in Foo { field } struct "instantiation", and it has the following fields:

pub member: Member,
pub colon_token: Option<Colon>,
pub expr: Expr,

Here's how it's used internally - if branch handles explicit (i.e. Foo { field: value }) key-value pairs, else branch handles Foo { field } syntax, and syn also copies location of the key to value.

At least we are not inventing anything new in this PR.

Another alternative that I see (and dislike a lot but still would like it be mentioned) is making pair.value nullable. I suppose that'd be a huge change for parser users, right?

I'm going to keep this PR open for several days, maybe someone can suggest a better approach.

pocke commented 2 years ago

FYI, I found two differences between MRI and parser gem other than lvars. I'll create a new issue if you need it.

First, MRI accepts a constant as key/value but parser gem doesn't.

$ ruby -ve 'p({String:})' 
ruby 3.1.0dev (2021-09-24T23:41:02Z master 39a6bf5513) [x86_64-linux]
{:String=>String}

$ bin/ruby-parse --31 -e 'p({String:})' 
(send nil :p
  (hash
    (pair
      (sym :String)
      (send nil :String))))

Second, MRI rejects a label with ! or ?, but parser gem doesn't.

$ ruby -e p({foo!:})
-e:1: identifier foo! is not valid to get

$ ruby -e p({foo?:})
-e:1: identifier foo? is not valid to get

$ bin/ruby-parse --31 -e 'p({foo!:})'
(send nil :p
  (hash
    (pair
      (sym :foo!)
      (send nil :foo!))))

$ bin/ruby-parse --31 -e 'p({foo?:})'
(send nil :p
  (hash
    (pair
      (sym :foo?)
      (send nil :foo?))))

mbj commented 2 years ago

Can I get a review of this API please? Are there any issues with multiple nodes having the same source location?

Unparser by definition does not use source locations and instead tries its best to generate concrete syntax "just" from the structure of the AST nodes. So the source location aspect does not concern me - at this point.

iliabylich commented 2 years ago

ok, time to finish it 😄

I've spent some time investigating what gettable function does in MRI and it's quite tricky because of how obfuscated it is:

MRI calls TOK_INTERN macro (or a wrapper/helper function tokenize_ident) to attach id to a token. ID is basically a Ruby Symbol.
this macro calls rb_intern3 to construct an ID that is internally represented as a tagged pointer (i.e. a pointer or a value depending on leading bytes that is &-checked every time it's dereferenced, constant known-at-compile-time symbols can be compared by value). This whole process is performed only for tokens that have variadic values like tCONSTANT, tIDENTIFIER, tLABEL and so on. Tokens like kCLASS don't need it because their value are constant and can be inferred based on a token type that is simply a flat number.
rb_intern3 is defined in symbol.c and it calls intern_str -> rb_str_symname_type -> rb_enc_symname_type (and also -> enc_synmane_type_leading_chars) that attaches additional bitflags:
- ID_GLOBAL for $-prefixed values
- or ID_INSTANCE for @-prefixed values
- or ID_CLASS for @@-prefixed values
- or ID_CONST for values starting with capital letter
- or ID_LOCAL otherwise
Later MRI uses these flags to check what's the type of the token.val

We do the same thing in many places in Builder, I think it makes sense to mirror these checks as separate methods to simplify backporting in the future.

This gettable methods is called Builder::assignable in parser, so I'm going to add missing checks there.

Also I've noticed that this method checks for assignment to numparams in numblocks and errors if there are any, but we check it directly in .y file. I've found a bug:

proc { 1 in _1 }

MRI rejects this code but we accept it. It will be fixed by moving numparam check to the builder (wrapped with @version check, assignment to numparam is valid for Ruby < 27)

iliabylich commented 2 years ago

@marcandre @koic @mbj @palkan @pocke Can I get a review please? Do you see any missing test cases?

iliabylich commented 2 years ago

Merging to unblock myself, feel free to post thoughts/requests here.

whitequark / parser

- ruby31.y: handle local variables as hash labels with omitted values #820