Closed wahuneke closed 1 year ago
Here's another thought, a more significant change and/or maybe more of a question to you:
I notice that the stock ast parse will convert 'multiline comments' that are done in the form of a string statement, into nodes having type: Expr -> value of type Constant
If you were to change the base class of ast_comments.Comment to be ast.Constant, maybe you'd get the compiler to ignore your Comment nodes automatically.
You're the experienced one here, I'm brand new to this. This is just something I happened to notice.
Edit: adding example
This is what I mean by multieline string statement comments... :
def my_func():
stuff = "doing some work here"
...
"""
Here's a 'multi line string' which can be interpreted as a comment. ie is one of Python's recommended styles for multiline
comment
"""
complicated_line = needs_complicated + explanation_comment
Copying all the scenarios in your test_parse.py
file, I did some testing for compile()
of trees that come out of your parse()
function. I found that this simple fixer func does the trick:
def compile_fixer(tree: ast.Module) -> ast.Module:
"""Rewrite the given tree, which might have Comment nodes, in a way so that it can run through `compile()`"""
class RewriteComments(NodeTransformer):
def visit_Comment(self, node: ast.Module) -> _t.Optional[ast.Module]:
# Just eliminate nodes of this type
return None
return RewriteComments().visit(tree)
I also tried the test with this version, which demonstrates that the compiler would also be happy if your comment nodes were implemented as Expressions containing a subclass of Constant in them:
def compile_fixer(tree: ast.Module) -> ast.Module:
"""Rewrite the given tree, which might have Comment nodes, in a way so that it can run through `compile()`"""
class RewriteComments(NodeTransformer):
def visit_Comment(self, node: ast.Module) -> _t.Optional[ast.Module]:
expr = ast.Expr(
lineno=node.lineno,
col_offset=node.col_offset,
value=ast.Constant(
value=node.value,
lineno=node.lineno,
col_offset=node.col_offset,
),
)
ast.copy_location(expr, node)
return expr
return RewriteComments().visit(tree)
I'm happy to make a PR with either of these solutions, or to make an attempt at rewriting your work so that Comment nodes are always represented in the compile-friendly form (as a constant expression) so that your parse output will be immediately ready to compile, without requiring a walk to fix the Comment nodes.
Great work!
Since compile
is irreversible, I think there's no point in preserving comments here. I'm for your first solution. Please, open a PR with this one.
About the base class for Comment
nodes. That's an interesting idea, but at first glance it doesn't solve the problem. The compiler still expects some sort of stmt
and gets a Comment
. Anyway I'll look into this in more details later, thanks.
One question to discuss: how are we going to use this fixer?
Should we overwrite the builtin compile
so that it checks if its source
argument is an instance of ast.AST
and then applies your fixer?
# ast_comments.py
_compile = compile
def compile(source, *args, **kwargs):
if isinstance(source, ast.AST):
source = compile_fixer(source)
_compile(source, *args, **kwargs)
Or should we put the fixer in another module (say helpers.py) and use it in client code when needed:
# client code
source = helpers.compile_fixer(source)
compile(source)
I think the first one is better. But there's so many overloads in builtins that we'll probably lose some signature-related info.
Lots of experimenting with Expr.Constant and its friends.... Found solution that was much, much simpler :)
This works. It compiles and it unparses cleanly:
@@ -7,7 +7,7 @@ from ast import * # noqa: F401,F403
from collections.abc import Iterable
-class Comment(ast.AST):
+class Comment(ast.Pass):
value: str
inline: bool
_fields = (
I can put that in a PR along with a change to testing. Do you prefer:
1) brand new test called test_compile
which copies all the test code from test_parse but calls compile
for each scenario
OR
2) I update test_parse
and add a random compile in each test that just ensures that the parse result compiles?
OR
something else?
Yeah, that's better. Simpler and cleaner.
I think one test will be enough: It should show that the tree with the comment node can be compiled. Plus, the compiled code is identical to that which was compiled from same source, but without comments.
source_1 = "a = 1 # coment"
source_2 = "a = 1"
assert compile(parse(source_1)) == complie(parse(source_2))
We can put this test in test_parse.py
. If at some point it becomes necessary to test other cases, we'll move the test to test_compile.py
I think the Pass
solution is not going to be acceptable. Compiling a tree with those nodes yields a NOOP for every occurrence of Pass. Semantically equivalent but potentially detrimental to performance. And, just a bit too "cheesy" probably.
In contrast, representing comments in the tree as Expr -> Constant (string) works better and compiles to code identical to a source without comments. The downside to that implementation is that it changes the attributes of the Comment object. Where your object formerly held comment as a string at attribute 'value', this would no longer be the case if Comment were reimplemented as an Expression with a Constant (string) in it.
I've learned many cool things about compiling, etc here. But, I'm afraid I'm wasting your time with something trivial. sorry about that :)
Given the above, how would you like to proceed?
1) continue with an implementation which will result in NOOPs unless you pre-treat and run our Comment-stripping visitor?
2) Go with the expression implementation, which has all the kinks worked out except that it means you can no longer get a string at Comment.value? (thus altering your interface for any existing package users... )
3) jump back to the original, original solution - provide a function which strips comments out, leave comment implementations alone, and stick with parse output that does not compile unless you run the stripper to remove the comment nodes?
Sorry! I didn't know this was going to get so protracted...
One more problem found:
Let's say it's possible to inherit Comment
from something concrete such as Pass
, Constant
, etc.
Hence we can create a "valid" code like this:
source = """
if True:
# comment
print(1)
"""
if_node = tree.body[0]
if_node.body.pop()
print(if_node.body) # [<ast_comments.Comment object at 0x104e85400>]
compile(tree, "", "exec") # compiles
But since the Comment
node is not a real one, the code above should fail with an error raised: "ValueError: empty body on If"
So I think it's safer to stick with the original solution by removing comments out of the tree.
I said before it was okay to override the compilation builtin, but now I'm not so sure. One could say that this is not the purpose of the library. I mean, a library with "ast" and "comments" in the name is not expected to behave this way. Sorry, I changed my mind, but I think we should either
compile_fixer
and new overrided compile
into some helper file that is not imported by defaultcompile
and call it something elseI completely agree. I'll update my PR
Thanks for the project! Love it and really impressed with the clean code. I hope they pull this into main line functionality soon!
Unsurprisingly, you can't run
compile()
on the output of your extended parser. Specifically, compile will fail with a message like this:My app would like to be able to preserve comments, walk and modify the tree, and then compile my brand new code. The cleanest way to do this would be to have a modified version of compile() or some helper function in ast-comments which provides the necessary fixes.
I'm thinking for a fix: use the node update visitor to walk the tree and drop every node of type Comment.
Would you be open to having this fix in the ast-comments package? Do you have a solution already in mind? Would you like me to add one with a PR? if you'd like me to do this, please specify impl details if you are 'picky' :) and I'll try to get something up in the next day or two