Closed iliabylich closed 2 years ago
Are there any issues with multiple nodes having the same source location?
As soon as it's safe to use the fact that source locations are equal for both key and value nodes to detect implicit hashes, that would work for me.
(In my implementation, I used a separate node type; but I guess that's not an option for Parser 🙂).
Tricky.
Was it considered to have the location be nil
instead?
The reservation I have with reusing the location (instead of nil
) makes it so close to the normal form that it makes it very difficult to distinguish them.
Writing a method to know if a particular lvar
is of such a form, one has to use the parent (not actually available), check that it is a pair
and that the lvar
in question is the second child, and finally compare the location with that of the first child.
This non-locality of the information doesn't seem ideal. I don't feel strongly about this, but I would intuitively think that the sym
should have a location but the lvar
/send
/... shouldn't.
To comment further, in RuboCop we do have access to the parent
, so whichever way this ends up, we can implement a method to check this (albeit ugly). Only risk factor is that a cop would rewrite the source without realizing this new case, and instead of crashing, and the resulting generated code could be incompatible. Maybe @koic or @dvandersluis can think of other impacts for RuboCop?
@marcandre There's a rule that all nodes have some location (at least expression_l
), I'm pretty sure that changing it would have bigger impact than sharing location between two nodes.
There's always an alternative - creating a new node type 😄
Ah ah, new mode type sounds really bad indeed. 😬
Empty expression (just after the colon) would be an alternative.
Empty expression (just after the colon) would be an alternative.
That also would be a breaking change (node without expr_l
). Also I don't know how I feel about "dummy" nodes in general, mostly because it splits nodes into two groups and it feels like this separation sits on top of node-type groups (i.e. this way we get two groups of types, dummy(typeA, typeB, ...)
and real(typeC, typeD, ...)
). I'd like to keep it flat.
I did quick research in other languages that support shorthand object field syntax.
JavaScript (Babel/parser) - the doc says that Object
node has a list of members where each member is a sum type of ObjectProperty | ObjectMethod | SpreadElement
where ObjectProperty
has
key: Expression
shorthand: boolean;
value: Expression
The doc has T | null
in many places, so I guess key
and value
are non-nullable, which means they share location. The flag is just a nice addition.
Rust (syn crate) - AST has a type called FieldValue that represents field
in Foo { field }
struct "instantiation", and it has the following fields:
pub member: Member,
pub colon_token: Option<Colon>,
pub expr: Expr,
Here's how it's used internally - if
branch handles explicit (i.e. Foo { field: value }
) key-value pairs, else
branch handles Foo { field }
syntax, and syn
also copies location of the key
to value
.
At least we are not inventing anything new in this PR.
Another alternative that I see (and dislike a lot but still would like it be mentioned) is making pair.value
nullable. I suppose that'd be a huge change for parser
users, right?
I'm going to keep this PR open for several days, maybe someone can suggest a better approach.
FYI, I found two differences between MRI and parser gem other than lvars. I'll create a new issue if you need it.
First, MRI accepts a constant as key/value but parser gem doesn't.
$ ruby -ve 'p({String:})'
ruby 3.1.0dev (2021-09-24T23:41:02Z master 39a6bf5513) [x86_64-linux]
{:String=>String}
$ bin/ruby-parse --31 -e 'p({String:})'
(send nil :p
(hash
(pair
(sym :String)
(send nil :String))))
Second, MRI rejects a label with !
or ?
, but parser gem doesn't.
$ ruby -e p({foo!:})
-e:1: identifier foo! is not valid to get
$ ruby -e p({foo?:})
-e:1: identifier foo? is not valid to get
$ bin/ruby-parse --31 -e 'p({foo!:})'
(send nil :p
(hash
(pair
(sym :foo!)
(send nil :foo!))))
$ bin/ruby-parse --31 -e 'p({foo?:})'
(send nil :p
(hash
(pair
(sym :foo?)
(send nil :foo?))))
Can I get a review of this API please? Are there any issues with multiple nodes having the same source location?
Unparser by definition does not use source locations and instead tries its best to generate concrete syntax "just" from the structure of the AST nodes. So the source location aspect does not concern me - at this point.
ok, time to finish it 😄
I've spent some time investigating what gettable
function does in MRI and it's quite tricky because of how obfuscated it is:
TOK_INTERN
macro (or a wrapper/helper function tokenize_ident
) to attach id
to a token. ID
is basically a Ruby Symbol
.rb_intern3
to construct an ID
that is internally represented as a tagged pointer (i.e. a pointer or a value depending on leading bytes that is &
-checked every time it's dereferenced, constant known-at-compile-time symbols can be compared by value). This whole process is performed only for tokens that have variadic values like tCONSTANT
, tIDENTIFIER
, tLABEL
and so on. Tokens like kCLASS
don't need it because their value are constant and can be inferred based on a token type that is simply a flat number.rb_intern3
is defined in symbol.c
and it calls intern_str
-> rb_str_symname_type
-> rb_enc_symname_type
(and also -> enc_synmane_type_leading_chars
) that attaches additional bitflags:
ID_GLOBAL
for $
-prefixed valuesID_INSTANCE
for @
-prefixed valuesID_CLASS
for @@
-prefixed valuesID_CONST
for values starting with capital letterID_LOCAL
otherwisetoken.val
We do the same thing in many places in Builder
, I think it makes sense to mirror these checks as separate methods to simplify backporting in the future.
This gettable
methods is called Builder::assignable
in parser
, so I'm going to add missing checks there.
Also I've noticed that this method checks for assignment to numparams in numblocks and errors if there are any, but we check it directly in .y
file. I've found a bug:
proc { 1 in _1 }
MRI rejects this code but we accept it. It will be fixed by moving numparam check to the builder (wrapped with @version
check, assignment to numparam is valid for Ruby < 27)
@marcandre @koic @mbj @palkan @pocke Can I get a review please? Do you see any missing test cases?
Merging to unblock myself, feel free to post thoughts/requests here.
There's a new rule in 3.1 that allows hash value to be omitted if it's equal to key:
We backported it in https://github.com/whitequark/parser/pull/818, but there's one case that we missed - local variables can also be used:
This PR amends this logic.
And of course, there are implications like circular arguments validation that was missing:
@marcandre @koic @mbj @palkan Can I get a review of this API please? Are there any issues with multiple nodes having the same source location?