gristlabs / asttokens

Annotate Python AST trees with source text and token information
Apache License 2.0
172 stars 34 forks source link

find_token() skips comments #10

Closed abulka closed 6 years ago

abulka commented 6 years ago

I'm trying to find the comment related to an AST node of the python source code I am analysing:

x = 1  # my comment

I tried

import tokenize
atok.find_token(node.last_token, tokenize.COMMENT)

where node is an AST node e.g. an AST Name node for 'x'

The find never works. Looking at the source code of asttokens, I think it is because when find_token() calls next_token() to iterate through the tokens, it never passes through True to the include_extra parameter of next_token().

Any chance of adding an include_extra parameter to find_token() and passing that through to next_token() ? You seem to have that parameter everywhere else!

dsagal commented 6 years ago

Is the linked pull request what you need?

abulka commented 6 years ago

Looks promising!

dsagal commented 6 years ago

I changed it actually, and committed to master, with some tests. Turns out there is no need for include_extra parameter, it should just always use True. So if you are looking for tokenize.COMMENT, it will now find it, and if you were looking for a regular token, it works the same as before. So the interface is the same, but your use case is fixed.

I'm closing, but let me know if you still have any issues with this.

abulka commented 6 years ago

Thanks - any idea when the new version will be available via pip?

dsagal commented 6 years ago

Just published.

abulka commented 6 years ago

Might have found a problem - or maybe its the way I'm using the library. When I scan for comments on a node, the next comment in the entire source code is found - regardless of how far away it is. I need to find comments only on the line that the node is part of.

Here is the repro of the weird behaviour:

import ast
import asttokens
import tokenize
from textwrap import dedent

src = dedent("""
    def hello():
        x = 5
        there()

    def there():
        return 999  # my silly comment

    hello()  # call it
    there()        
""")

class RecursiveVisitor(ast.NodeVisitor):
    """ example recursive visitor """

    def recursive(func):
        """ decorator to make visitor work recursive """
        def wrapper(self,node):
            self.dump_line_and_comment(node)
            func(self,node)
            for child in ast.iter_child_nodes(node):
                self.visit(child)
        return wrapper

    def dump_line_and_comment(self, node):
        comment = atok.find_token(node.first_token, tokenize.COMMENT)
        print(f'On line "{node.first_token.line.strip():20s}" find_token found "{comment}"')

    @recursive
    def visit_Assign(self,node):
        """ visit a Assign node and visits it recursively"""

    @recursive
    def visit_BinOp(self, node):
        """ visit a BinOp node and visits it recursively"""

    @recursive
    def visit_Call(self,node):
        """ visit a Call node and visits it recursively"""

    @recursive
    def visit_Lambda(self,node):
        """ visit a Function node """

    @recursive
    def visit_FunctionDef(self,node):
        """ visit a Function node and visits it recursively"""

atok = asttokens.ASTTokens(src, parse=True)
tree = atok.tree
visitor = RecursiveVisitor()
visitor.visit(tree)

Gives me:

On line "def hello():        " find_token found "COMMENT:'# my silly comment'"
On line "x = 5               " find_token found "COMMENT:'# my silly comment'"
On line "there()             " find_token found "COMMENT:'# my silly comment'"
On line "def there():        " find_token found "COMMENT:'# my silly comment'"
On line "hello()  # call it  " find_token found "COMMENT:'# call it'"
On line "there()             " find_token found "ENDMARKER:''"
dsagal commented 6 years ago

That's not a problem with this module, it's just not a feature of it: find_token finds the next matching token regardless of the line. But line breaks themselves introduce tokens, so you can write a helper to find the next comment on the same line as a given token, like so:

def find_line_comment(atok, start_token):
    t = start_token
    while t.type not in (tokenize.COMMENT, tokenize.NL, tokenize.NEWLINE, token.ENDMARKER):
      t = atok.next_token(t, include_extra=True)
    return t if t.type == tokenize.COMMENT else None
abulka commented 6 years ago

Thanks - that helper routine works great. A slight tweak I made is to either return the comment string or an empty string:

def find_line_comment(start_token):
    t = start_token
    while t.type not in (tokenize.COMMENT, tokenize.NL, tokenize.NEWLINE, tokenize.ENDMARKER):
        t = self.atok.next_token(t, include_extra=True)
    return t.string if t.type == tokenize.COMMENT else ''

comment = find_line_comment(node.first_token)

P.S. My old hack approach was not very 'token' based and for the curious, was simply:

line = node.first_token.line
comment_i = line.find('#')
comment = line[comment_i:].strip() if comment_i != -1 else ''