rubocop / rubocop-ast

RuboCop's AST extensions and NodePattern functionality
https://docs.rubocop.org/rubocop-ast
MIT License
112 stars 53 forks source link

`BlockNode` with `SendNode` and heredoc argument returns wrong body source #293

Closed Earlopain closed 5 months ago

Earlopain commented 5 months ago

I ran into https://github.com/rubocop/rubocop-rspec/issues/1913 and tried to work on a fix. But there is something happening that makes this rather hard:

require 'rubocop-ast'

source = RuboCop::AST::ProcessedSource.new(<<~RUBY, 3.3)
  before do
    bar(<<~'TEXT')
      Hello World!
    TEXT
  end
RUBY
pp source.ast
# s(:block,
#   s(:send, nil, :before),
#   s(:args),
#   s(:send, nil, :bar,
#     s(:str, "Hello World!\n")))
pp source.ast.body
# s(:send, nil, :bar,
#  s(:str, "Hello World!\n"))
pp source.ast.body.multiline?
# false
pp source.ast.body.source_length
# 14
pp source.ast.body.source
# "bar(<<~'TEXT')

Does this look like expected behaviour? To me the last 4 printouts seem wrong but it's possible I'm missing something.

marcandre commented 5 months ago

Right. So the location of any child node is contained is the location of its parent node... except for HEREDOCs. That's the only exception and it makes thinking about code locations a bit strange.

You might be thinking that the following lines should also be part of the source / locations, but you must note that HEREDOCs are very odd, they are not necessarily contiguous, even in terms of lines, e.g.:

foo(<<~'FOO', bar(<<~'TEXT'))
  This is FOO
FOO
  Hello World!
TEXT

As far as bar(...) is concerned, the content of the HEREDOC starts 2 lines later!

The implementation of multiline? simply looks that the location of parent node, and it does start and end on the same line. It's just that its argument spills over to the next lines.

I wouldn't be opposed to adding an argument to multiline? so it checks the children, maybe like multiline?(including_children: true)?

As for the rest, what other results did you have in mind?

Earlopain commented 5 months ago

Yeah, heredocs are always a nice bag of surprises. I don't even want to think about the case you wrote and how to handle that (:

For my initial case, how would I fetch the full block content? I tried a bunch of things but only managed to get to it through its parent, which then also includes other content I don't want.

I would expect to get the 3 lines back without much fuss since all the wonderful things you can do with heredocs don't apply to that case (I think so at least). Having to special-case this every time you want to get a blocks source doesn't sound fun. block.source is my main use-case, I only added the other printouts to show a bit more.


A bit of context, in rubocop-rspec the goal of the cop is simply to merge the contents of two blocks into one

marcandre commented 5 months ago

One solution would be to not do the auto-correction if any of the descendents involved is a HEREDOC... 😅

If you want to handle it nevertheless, I think (I am not 100% sure though) that a node is always it's location + potentially a single range encompassing all its HEREDOCs (potentially disconnected as shown above). Even with nested HEREDOCs, I think they would always have to be contiguous.

If that's so, we could add a function Node#heredoc_location that would return the location for all the HEREDOCs of a given Node's descendents.

This would still make things a big challenging for you, as you'd have to check if any of the heredocs have the same delimiter...

Earlopain commented 5 months ago

Yeah, not offering autocorrect may be the simplest solution.

Though I don't really understand, why would the heredoc not be considered part of the block itself? I'm not looking at individual nodes, just the over encompassing block, which to me logically also includes the heredocs.

I would have to check later when I got some time but I feel like there are a bunch of other cops for which this would also be a problem.

marcandre commented 5 months ago

Let me repeat that in the general case a node's source code is not continuous if there are heredocs involved.

I'm assuming that by "block", you mean a block Node like in your example above. There's nothing special about a block node, so it's easy to construct an example where the whole source code for it is disconnected (i.e. the main "expression part" and the "heredoc part"):

foo {<<~'TEXT'} +  'not part of the block'
  This is the TEXT
TEXT

Or even disconnected by lines:

 <<~'FOO' + foo {<<~'TEXT'}
  This is FOO
FOO
    Hello World!
TEXT

What would you do in those cases?

So, as far as rubocop-ast is concerned, we could consider an API that returns the "full location" range, which could return an array; either the "expression part" + "heredoc part" , or a "full expression part" in case these are contuguous

marcandre commented 5 months ago

Now that I look more closely at the example you provided... I'm not sure why you don't use the block's location:

> pp source.ast.location
#<Parser::Source::Map::Collection:0x000000011de2c278
 @begin=#<Parser::Source::Range (string) 7...9>,
 @end=#<Parser::Source::Range (string) 51...54>,
 @expression=#<Parser::Source::Range (string) 0...54>,

If you take the expressions' location, and remove the begin and end's location, you'll get the full range of what you want.

Earlopain commented 5 months ago

What an awful example, I see now what you are talking about. It took me a moment to even discern where the block is. I'm not sure if the api you are talking about would makes sense since you would need to check if this happens anyways. It probably (hopefully) isn't every common code but you'd just push the need for a check to some other place.

I myself would qualify this as cursed syntax. Always learning something new.


If you take the expressions' location, and remove the begin and end's location, you'll get the full range of what you want.

Oops. I'm not really sure, this just didn't cross my mind. It does sound pretty much like what I was trying to do.

Thank you for talking with me about this, I would not have gotten this otherwise. I see now that there is not much that rubocop-ast can do here.

marcandre commented 5 months ago

Sure thing. It's really the concept of "a node corresponds to a continuous range of characters" that is tricky. Corrolary is that a blocks' location minus begin and end correspond more or less to its body's location. These are true "all the time",... except when they aren't. Good luck!