joernio / joern

Open-source code analysis platform for C/C++/Java/Binary/Javascript/Python/Kotlin based on code property graphs. Discord https://discord.gg/vv4MH284Hc
https://joern.io/
Apache License 2.0
2.09k stars 288 forks source link

[Bug][pysrc2cpg] `Invalid Code` parsing issues #4577

Open AndreiDreyer opened 5 months ago

AndreiDreyer commented 5 months ago

Describe the bug When trying to generate a cpg of the following repo Pygoat, the python parser shows two warnings for lines of code it considers Invalid.

image

These relate to the following functions:

image

image

To Reproduce Steps to reproduce the behavior: 1) Create test.py with the following:

def register(request):
    if request.method == "POST":
        form = NewUserForm(request.POST)
        if form.is_valid():
            user = form.save()
            login(request, user)
            messages.success(request, "Registration successful." )
            return redirect('/')
        messages.error(request, "Unsuccessful registration. Invalid information.")
    form = NewUserForm()
    return render (request=request, template_name="registration/register.html", context={"register_form":form})

2) ./joern-parse test.py

1) Create test.py with the following:

class NewUserForm(UserCreationForm):
    def save(self, commit=True):
        user = super(NewUserForm, self).save(commit=False)
        user.email = self.cleaned_data['email']
        if commit:
            user.save()
        return user

2) ./joern-parse test.py

Expected behavior No parsing warning shown for the above code.

Desktop (please complete the following information):

DavidBakerEffendi commented 5 months ago

Very strange, if I re-indent the whitespace for user.save() I parse successfully. Perhaps a fix for this issue should be robust to the whitespace.

Specifically, the before user.save() is the problem. There are three tabs before it, but the third tab closest is the issue.

DavidBakerEffendi commented 5 months ago

Detailed error from the parser is

Encountered " <NAME> "user "" at line 7, column 4.
Was expecting:
    <INDENT> ...

From the message above, if I add another it also parses successfully

DavidBakerEffendi commented 5 months ago

Using a formatter seems to help here:

cd pygoat
pip3 install black
black .

Then I no longer get parser issues.

Not ideal, this is only a workaround, but should unblock this issue.