t3rn0 / ast-comments

Extension to the built-in ast module. Finds comments in source code and adds them to the parsed tree.
MIT License
31 stars 10 forks source link

ast_comments parse output cannot be `compile()`d #23

Closed wahuneke closed 1 year ago

wahuneke commented 1 year ago

Thanks for the project! Love it and really impressed with the clean code. I hope they pull this into main line functionality soon!

Unsurprisingly, you can't run compile() on the output of your extended parser. Specifically, compile will fail with a message like this:

>>> t = ast_comments.parse("def hi():\n\t# yup\n\t#    more comment\n\tprint('hi')\n")
>>> compile(t, "", "exec")
Traceback (most recent call last):
  File "/home/bill/.local/share/JetBrains/Toolbox/apps/pycharm-professional/plugins/python/helpers/pydev/pydevconsole.py", line 364, in runcode
    coro = func()
           ^^^^^^
  File "<input>", line 1, in <module>
TypeError: expected some sort of stmt, but got <ast_comments.Comment object at 0x7fabe6efb940>

My app would like to be able to preserve comments, walk and modify the tree, and then compile my brand new code. The cleanest way to do this would be to have a modified version of compile() or some helper function in ast-comments which provides the necessary fixes.

I'm thinking for a fix: use the node update visitor to walk the tree and drop every node of type Comment.

Would you be open to having this fix in the ast-comments package? Do you have a solution already in mind? Would you like me to add one with a PR? if you'd like me to do this, please specify impl details if you are 'picky' :) and I'll try to get something up in the next day or two

wahuneke commented 1 year ago

Here's another thought, a more significant change and/or maybe more of a question to you:

I notice that the stock ast parse will convert 'multiline comments' that are done in the form of a string statement, into nodes having type: Expr -> value of type Constant

If you were to change the base class of ast_comments.Comment to be ast.Constant, maybe you'd get the compiler to ignore your Comment nodes automatically.

You're the experienced one here, I'm brand new to this. This is just something I happened to notice.

Edit: adding example

This is what I mean by multieline string statement comments... :

def my_func():
   stuff = "doing some work here"
   ...
   """
   Here's a 'multi line string' which can be interpreted as a comment. ie is one of Python's recommended styles for multiline
   comment
   """
   complicated_line = needs_complicated + explanation_comment
wahuneke commented 1 year ago

Copying all the scenarios in your test_parse.py file, I did some testing for compile() of trees that come out of your parse() function. I found that this simple fixer func does the trick:

def compile_fixer(tree: ast.Module) -> ast.Module:
    """Rewrite the given tree, which might have Comment nodes, in a way so that it can run through `compile()`"""

    class RewriteComments(NodeTransformer):
        def visit_Comment(self, node: ast.Module) -> _t.Optional[ast.Module]:
            # Just eliminate nodes of this type
            return None

    return RewriteComments().visit(tree)

I also tried the test with this version, which demonstrates that the compiler would also be happy if your comment nodes were implemented as Expressions containing a subclass of Constant in them:

def compile_fixer(tree: ast.Module) -> ast.Module:
    """Rewrite the given tree, which might have Comment nodes, in a way so that it can run through `compile()`"""

    class RewriteComments(NodeTransformer):
        def visit_Comment(self, node: ast.Module) -> _t.Optional[ast.Module]:
            expr = ast.Expr(
                lineno=node.lineno,
                col_offset=node.col_offset,
                value=ast.Constant(
                    value=node.value,
                    lineno=node.lineno,
                    col_offset=node.col_offset,
                ),
            )
            ast.copy_location(expr, node)
            return expr

    return RewriteComments().visit(tree)

I'm happy to make a PR with either of these solutions, or to make an attempt at rewriting your work so that Comment nodes are always represented in the compile-friendly form (as a constant expression) so that your parse output will be immediately ready to compile, without requiring a walk to fix the Comment nodes.

t3rn0 commented 1 year ago

Great work! Since compile is irreversible, I think there's no point in preserving comments here. I'm for your first solution. Please, open a PR with this one.

About the base class for Comment nodes. That's an interesting idea, but at first glance it doesn't solve the problem. The compiler still expects some sort of stmt and gets a Comment. Anyway I'll look into this in more details later, thanks.

One question to discuss: how are we going to use this fixer? Should we overwrite the builtin compile so that it checks if its source argument is an instance of ast.AST and then applies your fixer?

# ast_comments.py
_compile = compile
def compile(source, *args, **kwargs):
    if isinstance(source, ast.AST):
        source = compile_fixer(source)
    _compile(source, *args, **kwargs)

Or should we put the fixer in another module (say helpers.py) and use it in client code when needed:

# client code
source = helpers.compile_fixer(source)
compile(source)

I think the first one is better. But there's so many overloads in builtins that we'll probably lose some signature-related info.

wahuneke commented 1 year ago

Lots of experimenting with Expr.Constant and its friends.... Found solution that was much, much simpler :)

This works. It compiles and it unparses cleanly:

@@ -7,7 +7,7 @@ from ast import *  # noqa: F401,F403
 from collections.abc import Iterable

-class Comment(ast.AST):
+class Comment(ast.Pass):
     value: str
     inline: bool
     _fields = (

I can put that in a PR along with a change to testing. Do you prefer:

1) brand new test called test_compile which copies all the test code from test_parse but calls compile for each scenario OR 2) I update test_parse and add a random compile in each test that just ensures that the parse result compiles? OR something else?

t3rn0 commented 1 year ago

Yeah, that's better. Simpler and cleaner.

I think one test will be enough: It should show that the tree with the comment node can be compiled. Plus, the compiled code is identical to that which was compiled from same source, but without comments.

source_1 = "a = 1  # coment"
source_2 = "a = 1"
assert compile(parse(source_1)) == complie(parse(source_2))

We can put this test in test_parse.py. If at some point it becomes necessary to test other cases, we'll move the test to test_compile.py

wahuneke commented 1 year ago

I think the Pass solution is not going to be acceptable. Compiling a tree with those nodes yields a NOOP for every occurrence of Pass. Semantically equivalent but potentially detrimental to performance. And, just a bit too "cheesy" probably.

In contrast, representing comments in the tree as Expr -> Constant (string) works better and compiles to code identical to a source without comments. The downside to that implementation is that it changes the attributes of the Comment object. Where your object formerly held comment as a string at attribute 'value', this would no longer be the case if Comment were reimplemented as an Expression with a Constant (string) in it.

I've learned many cool things about compiling, etc here. But, I'm afraid I'm wasting your time with something trivial. sorry about that :)

Given the above, how would you like to proceed?

1) continue with an implementation which will result in NOOPs unless you pre-treat and run our Comment-stripping visitor?

2) Go with the expression implementation, which has all the kinks worked out except that it means you can no longer get a string at Comment.value? (thus altering your interface for any existing package users... )

3) jump back to the original, original solution - provide a function which strips comments out, leave comment implementations alone, and stick with parse output that does not compile unless you run the stripper to remove the comment nodes?

Sorry! I didn't know this was going to get so protracted...

t3rn0 commented 1 year ago

One more problem found: Let's say it's possible to inherit Comment from something concrete such as Pass, Constant, etc. Hence we can create a "valid" code like this:

source = """
if True:
    # comment
    print(1)
"""
if_node = tree.body[0]
if_node.body.pop()
print(if_node.body)  # [<ast_comments.Comment object at 0x104e85400>]
compile(tree, "", "exec")  # compiles

But since the Comment node is not a real one, the code above should fail with an error raised: "ValueError: empty body on If"

So I think it's safer to stick with the original solution by removing comments out of the tree.

t3rn0 commented 1 year ago

I said before it was okay to override the compilation builtin, but now I'm not so sure. One could say that this is not the purpose of the library. I mean, a library with "ast" and "comments" in the name is not expected to behave this way. Sorry, I changed my mind, but I think we should either

wahuneke commented 1 year ago

I completely agree. I'll update my PR