portante / pycscope

Cscope database generator for Python source code
GNU General Public License v2.0
141 stars 29 forks source link

Add debugging level where walkCst emits CST tree printy-printed #10

Open portante opened 12 years ago

portante commented 12 years ago

From Free Speech:

diff --git a/pycscope.py b/pycscope.py
index 4e9d07f..39c5b2b 100755
--- a/pycscope.py
+++ b/pycscope.py
@@ -607,7 +607,7 @@ def isTrailerFuncCall(cst):
     """
     if len(cst) < 4:
         return False
-
+    printCst(cst)
     return (cst[-2][0] == symbol.trailer) \
             and (cst[-2][1][0] == token.DOT) \
             and (cst[-2][2][0] == token.NAME) \
@@ -769,6 +769,7 @@ def processNonTerminal(ctx, cst):
             ctx.setMark(cst[-2][2], Mark.FUNC_CALL)
             # Suspend COMMA processing in trailers
             ctx.in_trailer = ctx.spb_lvl[token.LPAR];
+        print("in_trailer: %s" % ctx.in_trailer)

 def processTerminal(ctx, cst):
     """ Process a given CST tuple representing a terminal symbol
@@ -914,7 +915,7 @@ def processTerminal(ctx, cst):

     return lineno

-def walkCst(ctx, cst):
+def walkCst(ctx, cst, processing=True):
     """ Scan the CST (tuple) for tokens, appending index lines to the buffer.
     """
     indent = 0
@@ -924,12 +925,13 @@ def walkCst(ctx, cst):
         while stack:
             cst, indent = stack.pop()

-            #print("%5d%s%s" % (lineno, " " * indent, nodeNames[cst[0]]))
+            print("%5d%s%s" % (lineno, " " * indent, nodeNames[cst[0]]))

-            if token.ISNONTERMINAL(cst[0]):
-                processNonTerminal(ctx, cst) 
-            else:
-                lineno = processTerminal(ctx, cst)
+            if processing:
+                if token.ISNONTERMINAL(cst[0]):
+                    processNonTerminal(ctx, cst) 
+                else:
+                    lineno = processTerminal(ctx, cst)

             indented = False
             for i in range(len(cst)-1, 0, -1):
@@ -944,6 +946,11 @@ def walkCst(ctx, cst):
         e.lineno = lineno
         raise e

+def printCst(cst):
+    print("CST subtree starts:")
+    walkCst(None, cst, False) # just print the tree
+    print("CST subtree ends")
+
 def parseSource(sourcecode, indexbuff, indexbuff_len, dump=False):
     """Parses python source code and puts the resulting index information into the buffer.
     """
fspeech commented 12 years ago

dumpCst as enhanced now makes this obsolete? Now they both do the same thing but I like dumpCst better as one doesn't have to switch on the print stmt inside walkCst.

I think I meant to say in my comment that the print stmt inside walkCst is still useful as one can turn it on to see exactly where the processing breaks. But printCst as a function is no longer needed.

portante commented 12 years ago

I am not sure it makes it obsolete, as you wrote it does let you see exactly where in the processing it breaks. I'll get to this one a bit later.

fspeech commented 12 years ago

printCst function turns off processing so it will print the whole cst regardless (which is also what dumpCst does). What we need is a debug flag that lets you print while processing so you can see when processing stops. What would be really cool is to save some of the processing history in a (circular) buffer that can then be auto dumped whenever the assertion exception is raised! But that would be a feature that would hopefully be rarely needed unless you plan a lot more work on pycscope. And if you do plan on a lot more features to come I really think you should take a look at the AST grammar.

portante commented 12 years ago

What features are you thinking about that would require the Abstract Syntax Tree to get correctly?

fspeech commented 12 years ago

Obviously anything that can be done in AST can also be done in CST so it is not required but it works at a level that fits what you are working to achieve better than CST. For example if we were using AST, firstly we never had to have a problem with assignment detection:

stmt = FunctionDef(identifier name, arguments args, stmt* body, expr* decorator_list) | ClassDef(identifier name, expr* bases, stmt* body, expr decorator_list) ... | Assign(expr\ targets, expr value) | AugAssign(expr target, operator op, expr value) ...

So you immediately get Assign/AugAssign. Then you look at expr:

expr = BoolOp(boolop op, expr* values) | BinOp(expr left, operator op, expr right) ... | Call(expr func, expr* args, keyword* keywords, expr? starargs, expr? kwargs) ... -- the following expression can appear in assignment context | Attribute(expr value, identifier attr, expr_context ctx) | Subscript(expr value, slice slice, expr_context ctx) | Name(identifier id, expr_context ctx) | List(expr* elts, expr_context ctx) | Tuple(expr* elts, expr_context ctx)

You have Call/Attribute/Subscript/Name/Tuple ... all given to you with the right context was well. The cst decoding right now seems to be trying to do what AST would have given you for free, but it is unlikely as complete as and as correct as the AST.

I agree if the current cst decoding is working well there is no need for the extra work to move to ast. That is why I qualified my suggestion with your development intentions.

On Wed, Sep 19, 2012 at 9:31 PM, Peter Portante notifications@github.comwrote:

What features are you thinking about that would require the Abstract Syntax Tree to get correctly?

— Reply to this email directly or view it on GitHubhttps://github.com/portante/pycscope/issues/10#issuecomment-8716560.

portante commented 12 years ago

The requirement for cscope is that every line of a source file that has a symbol on it appears in the cscope database.

If the AST can provide that, than it can be used in place of the CST.

That said, retooling to use AST seems like a lot of work, so I would only suggest we engage in that effort if there is a feature that would benefit from it.

Do you have any in mind?

fspeech commented 12 years ago

Feature wise I am happy with the way it is. I am also no expert on cscope so I am not aware of the requirement you just mentioned. Good to know. If no more difficult bugs pop up I would agree that it is best to keep things the way they are.

On Thu, Sep 20, 2012 at 5:03 PM, Peter Portante notifications@github.comwrote:

The requirement for cscope is that every line of a source file that has a symbol on it appears in the cscope database.

...