nimble-code / Cobra

An interactive (fast) static source code analyzer
136 stars 30 forks source link

about python comment lines #43

Open yilmazdurmaz opened 2 years ago

yilmazdurmaz commented 2 years ago

Although -Python` flag gives Cobra the ability to recognize python keywords, it does not tokenize comments.

I could write a simple (naive) fix for single-line comments like this into cobra_lex.c:981:

        case '#':
            if(python)
            {
                p_comment("#",cid);
                continue;
            }

but multi-line comments, triple quotes """, require double lookup for characters ahead of the quote ", the string marker. I could not comprehend how to implement it, I guess we need a new complementary next2char(int cid) for this purpose.

Can you check the above code, and comment for both!?

yilmazdurmaz commented 2 years ago

With this following test file, Cobra at least sees single line comments, but I am just not sure if the code I provided should be the way to do it.

#test.py
def test():
    name = input( "name" )
    # name = input( "name" )
test()
cobra -comments -Python -c '%{ print .type ": " .txt "\n"; %}' test.py
...
cmnt: # name = input( "name" )
...

By the way, some rules currently freeze Cobra such as basic.cobra even with -Python flag, and are still failing even with my fix due to triple quotes not marked as comments.

The following, for example, causes basic.cobra to freeze since there is no ) from rule's perspective, as it was just commented out by preceding // :

 """
 (//)
 """

Also, without my single-line comment fix, the following also freezes but I could not figure out why:

# name = input( "name" )
yilmazdurmaz commented 2 years ago

I have just noticed I am making a big mistake by telling "let's make comment tokens from triple quotes".

Why? because triple quotes are also used to make multi-line strings. the difference is the placement of it. if it is preceded by an assignment (=) operation, it is a string, else a comment, (possibly a doc-"string".

I might be missing something else here, better be slow to jump to a conclusion other than this "next2char(int cid)" as it also may help in other situations.

sthagen commented 2 years ago

Also, these strings come in different flavors, eg raw multiline strings like r‘‘‘‘‘‘ and r““““““ (sorry for any typographic glyphs - that is the ios tablet github web api …)

yilmazdurmaz commented 2 years ago

@nimble-code I think you were pretty tired of other improvements and couldn't notice this one is still a problem.

sthagen commented 2 years ago

@yilmazdurmaz tired? I suggest we do not communicate this way here as guests. Maybe come back with a complete proposal instead? Even then, please accept that the maintainer may consider but not integrate. There are projects on Github that do not offer any discussion or issue capabilities. One of the reasons is frustration and wear out on the side of the maintainers.

yilmazdurmaz commented 2 years ago

@sthagen do you mean we should not trust this project just because you see yourself as a "guest" rather than a contributor?

I may not have a full proposal but If I had one, I would write the solution myself and make a PR. Or I would use exact meaning with "I have this working for me, is it good to apply to project" as I did in my previous posts.

I see myself as a tester here until I learn more about how to develop further. I can dig into the code only up to a point for now. That is why I have "Can you check the above code, and comment for both!?" in it.

And my message above has nothing to do to be offensive. Rather, I thought this issue is closed by mistake and tried to point out the obvious reason for that. If my words of choice are inappropriate, accept my apologies @nimble-code .

nimble-code commented 2 years ago

no worries! I had closed it because I thought you had reconsidered the request -- having more unexpected angles. back on the todo list!

yilmazdurmaz commented 2 years ago

@nimble-code thanks,

this actually a limitation for the current implementation but it is not necessarily to be solved this week or month. It may take a year maybe for you, me, or someone else from the community to come up with a solution.

It is just that keeping the issue open helps to keep the awareness of it until we find a fix if any.

nimble-code commented 6 months ago

just an update: the next release of cobra will have better support for python, including recognizing nested blocks, triple quoted strings, comments, etc. -- still in testing now, but it looks good so far