pixee / codemodder-python

Python implementation of the Codemodder framework
GNU Affero General Public License v3.0
35 stars 10 forks source link

Assignment deleted by SQL parameterization codemod #441

Open drdavella opened 5 months ago

drdavella commented 5 months ago

I was experimenting with adding support for the .raw method to the SQL parameterization codemod. This would be useful for supporting various ORMs (e.g. SQLAlchemy, Django) that enable pass-thru execution of raw SQL queries. The change I made looked like this:

diff --git a/src/core_codemods/sql_parameterization.py b/src/core_codemods/sql_parameterization.py
index 8c5042c..0636023 100644
--- a/src/core_codemods/sql_parameterization.py
+++ b/src/core_codemods/sql_parameterization.py
@@ -558,7 +558,7 @@ class FindQueryCalls(ContextAwareVisitor, LinearizeStringMixin):

     def leave_Call(self, original_node: cst.Call) -> None:
         maybe_call_name = get_function_name_node(original_node)
-        if maybe_call_name and maybe_call_name.value == "execute":
+        if maybe_call_name and maybe_call_name.value in ["execute", "raw"]:
             # TODO don't parameterize if there are parameters already
             # may be temporary until I figure out if named parameter will work on most drivers
             if len(original_node.args) > 0 and len(original_node.args) < 2:

I then ran codemodder against PyGoat. When I did, I got a change that looked like the following:

changed:
  - introduction/views.py
    diff:
      ---
      +++
      @@ -152,11 +152,11 @@

                   if login.objects.filter(user=name):

      -                sql_query = "SELECT * FROM introduction_login WHERE user='"+name+"'AND password='"+password+"'"
      +                sql_query = "SELECT * FROM introduction_login WHERE user=?"+"AND password=?"
                       print(sql_query)
                       try:
                           print("\nin try\n")
      -                    val=login.objects.raw(sql_query)
      +                    val=login.objects.raw(sql_query, (name, password, ))
                       except:
                           print("\nin except\n")
                           return render(
      @@ -257,7 +257,6 @@
           startInd = text.find('>')
           endInd = text.find('<', startInd)
           text = text[startInd + 1:endInd:]
      -    p=comments.objects.filter(id=1).update(comment=text)

           return render(request, 'Lab/XXE/xxe_lab.html')

The fix itself looks good; it parameterizes the raw query and removes the potential vulnerability.

However, I'm concerned about the deletion of this line:

           p=comments.objects.filter(id=1).update(comment=text)

@andrecsilva informs me that this is occurring due to the fact that p is unused subsequently in the code. However, the problem is that the RHS of the assignment may have a side effect: in this case it appears to be updating the database, but it could have a perfectly arbitrary side effect in general.

My understanding is that the unused variable removal is part of the cleanup phase of the SQL parameterization codemod. However, we need to try to restrict this logic somehow to apply only to those variables that were involved in the actual fix.

clavedeluna commented 1 month ago

I started looking at this issue but decided we should have a discussion before deciding on the fix here as there isn't a clear fix. Right now the RemoveUnusedVariables transformer runs on the entire file, which leads to this issue of removing unused variables. One option is to decide to do nothing. This codemod is MERGE_AFTER_CURSORY_REVIEW, so pixee users would be responsible for the cleanup.

Another option we have is to decide not to run that transformer and leave any unused variables be, to prevent removal of unused variables with side effects. Likewise, users would be responsible for removing unused variables.

A middle ground option, which I expect to be quite a challenge, would be to not blanket run RemoveUnusedVariables on the entire module, but to somehow be able to only remove variables related to sql query creation. I expect it's possible we will also end up accidentally removing side effect variables.