Reduce false positives for invalid-usage-of-modified-variable

risto-liftoff commented 1 year ago

Hi! When I ran this rule against my code, it generated a finding that I could tell was a false positive. Essentially, it was flagging this as an issue:

eng7, err := getEngineerAtIndex(engineers, 8)
if err != nil {
    eng7 = &Engineer{0, "N/A", "N/A", 0, nil}
}

which shouldn't be problematic since the only use of eng7 is assigning a new value.

CLAassistant commented 1 year ago

All committers have signed the CLA.

GrosQuildu commented 1 year ago

Thanks for the submission! Interesting false positive you have found. Looking again at the rule, it seems that we not only have the false positive, but also the original patterns are not comprehensive enough.

Wrote the following test that should catch more cases, including yours. Please let me know if the new test cases are valid. If so, we would greatly appreciate fixes that will make the test pass. Otherwise, we can merge the PR as it is and save outstanding not-passing tests in our TODO list.


func (e *Engineer) String() string {
    return e.FName + " " + e.LName
}

func main() {
    engineers := getEngineers()

    // BASIC TESTS
    // ruleid: invalid-usage-of-modified-variable
    eng1, err := getEngineerAtIndex(engineers, 5)
    if err != nil {
        fmt.Printf("Unable to obtain engineer with FName %s\n", eng1.FName)
    }

    // ruleid: invalid-usage-of-modified-variable
    eng11, err := getEngineerAtIndex(engineers, 5)
    if err != nil {
        fmt.Printf("Something")
        fmt.Printf("Unable to obtain engineer with FName %s\n", eng11.FName)
        fmt.Printf("Something")
    }

    // ruleid: invalid-usage-of-modified-variable
    var eng12 *Engineer
    eng12, err = getEngineerAtIndex(engineers, 5)
    if err != nil {
        fmt.Printf("Unable to obtain engineer with FName %s\n", eng12.FName)
    }

    // ruleid: invalid-usage-of-modified-variable
    var eng13 *Engineer
    eng13, err = getEngineerAtIndex(engineers, 5)
    if err != nil {
        fmt.Printf("Something")
        fmt.Printf("Unable to obtain engineer with FName %s\n", eng13.FName)
        fmt.Printf("Something")
    }

    // BASIC TESTS - inline if
    // ruleid: invalid-usage-of-modified-variable
    if eng2, err := getEngineerAtIndex(engineers, 6); err != nil {
        fmt.Printf("Unable to obtain engineer with FName %s\n", eng2.FName)
    }

    // ruleid: invalid-usage-of-modified-variable
    if eng21, err := getEngineerAtIndex(engineers, 6); err != nil {
        fmt.Printf("Something")
        fmt.Printf("Unable to obtain engineer with FName %s\n", eng21.FName)
        fmt.Printf("Something")
    }

    // ruleid: invalid-usage-of-modified-variable
    var eng22 *Engineer
    if eng22, err = getEngineerAtIndex(engineers, 6); err != nil {
        fmt.Printf("Unable to obtain engineer with FName %s\n", eng22.FName)
    }

    // ruleid: invalid-usage-of-modified-variable
    var eng23 *Engineer
    if eng23, err = getEngineerAtIndex(engineers, 6); err != nil {
        fmt.Printf("Something")
        fmt.Printf("Unable to obtain engineer with FName %s\n", eng23.FName)
        fmt.Printf("Something")
    }

    // WHOLE VAR AND METHODS TESTS
    // ruleid: invalid-usage-of-modified-variable
    eng4, err := getEngineerAtIndex(engineers, 5)
    if err != nil {
        fmt.Printf("Something")
        fmt.Printf("%#v\n", eng4)
        fmt.Printf("Something")
    }

    // ruleid: invalid-usage-of-modified-variable
    eng5, err := getEngineerAtIndex(engineers, 5)
    if err != nil {
        fmt.Printf("Something")
        fmt.Printf("%s\n", eng5.String())
        fmt.Printf("Something")
    }

    // FIELD ASSIGN TESTS
    eng3 := Engineer{4, "Lee", "Renaldo", 50, nil}
    // ruleid: invalid-usage-of-modified-variable
    eng3.Address, err = getEngineerAddressByIndex(engineers, 1)
    if err != nil {
        fmt.Printf("Unable to obtain address %#v!\n", eng3.Address)
    }

    // ruleid: invalid-usage-of-modified-variable
    eng3.Address, err = getEngineerAddressByIndex(engineers, 1)
    if err != nil {
        fmt.Printf("Something")
        fmt.Printf("Unable to obtain address %#v!\n", eng3.Address)
        fmt.Printf("Something")
    }

    // FALSE POSITIVES
    // ok: invalid-usage-of-modified-variable
    eng6, err := getEngineerAtIndex(engineers, 7)
    if err != nil {
        eng6 = &Engineer{0, "N/A", "N/A", 0, nil}
        fmt.Printf("Unable to obtain engineer %d!\n", eng6.Id)
    }

    // ok: invalid-usage-of-modified-variable
    eng61, err := getEngineerAtIndex(engineers, 7)
    if err != nil {
        fmt.Printf("Something")
        eng61 = &Engineer{0, "N/A", "N/A", 0, nil}
        fmt.Printf("Unable to obtain engineer %d!\n", eng61.Id)
        fmt.Printf("Something")
    }

    // ok: invalid-usage-of-modified-variable
    eng7, err := getEngineerAtIndex(engineers, 8)
    if err != nil {
        eng7 = &Engineer{0, "N/A", "N/A", 0, nil}
    }

    var eng8 *Engineer
    // ok: invalid-usage-of-modified-variable
    eng8, err = getEngineerAtIndex(engineers, 7)
    if err != nil {
        eng8 = &Engineer{0, "N/A", "N/A", 0, nil}
        fmt.Printf("Unable to obtain engineer %d!\n", eng8.Id)
    }

    var eng81 *Engineer
    // ok: invalid-usage-of-modified-variable
    eng81, err = getEngineerAtIndex(engineers, 7)
    if err != nil {
        fmt.Printf("Something")
        eng81 = &Engineer{0, "N/A", "N/A", 0, nil}
        fmt.Printf("Unable to obtain engineer %d!\n", eng81.Id)
        fmt.Printf("Something")
    }

    // ok: invalid-usage-of-modified-variable
    eng9, err := getEngineerAtIndex(engineers, 8)
    if err != nil {
        eng9 = &Engineer{0, "N/A", "N/A", 0, nil}
    }

    var eng91 *Engineer
    // ok: invalid-usage-of-modified-variable
    eng91, err = getEngineerAtIndex(engineers, 8)
    if err != nil {
        eng91 = &Engineer{0, "N/A", "N/A", 0, nil}
    }

    // TRUE POSITIVES
    // ruleid: invalid-usage-of-modified-variable
    eng10, err := getEngineerAtIndex(engineers, 10)
    if err != nil {
        fmt.Printf("Something")
        fmt.Printf("Unable to obtain engineer %d!\n", eng10.Id)
        eng10 = &Engineer{0, "N/A", "N/A", 0, nil}
        fmt.Printf("Unable to obtain engineer %d!\n", eng10.Id)
    }

    var eng101 *Engineer
    // ruleid: invalid-usage-of-modified-variable
    eng101, err = getEngineerAtIndex(engineers, 10)
    if err != nil {
        fmt.Printf("Something")
        fmt.Printf("Unable to obtain engineer %d!\n", eng101.Id)
        eng101 = &Engineer{0, "N/A", "N/A", 0, nil}
        fmt.Printf("Unable to obtain engineer %d!\n", eng101.Id)
    }

    // ruleid: invalid-usage-of-modified-variable
    eng11, err = getEngineerAtIndex(engineers, 8)
    if err != nil {
        eng11.Address = nil
    }

    fmt.Printf("Engineer 1: %s", fmt.Sprintf("%s %s", eng1.FName, eng1.LName))
    fmt.Printf("Engineer 7: %s", fmt.Sprintf("%s %s", eng7.FName, eng7.LName))
    fmt.Printf("Engineer 7: %s", fmt.Sprintf("%s %s", eng9.FName, eng9.LName))
    fmt.Printf("Engineer 7: %s", fmt.Sprintf("%s %s", eng91.FName, eng91.LName))
}

risto-liftoff commented 1 year ago

Those test cases all look valid to me. Would you like me to update the PR with a pattern that will pass all these test cases?

GrosQuildu commented 1 year ago

Sure, would be great if you could do that. Thanks.

risto-liftoff commented 1 year ago

I wasn't able to come up with a pattern that perfectly matched the test cases due to what seems to be a limitation in the semgrep rule syntax. I included a thorough comment in the test cases file explaining why I wasn't able to write a pattern to catch 3 of the test cases.

risto-liftoff commented 1 year ago

I found another instance in our code that I would consider a false positive:

eng, err := getEngineerAtIndex(slice, idx)
if err != nil {
    return eng, err
}

I've updated the rule to exclude this pattern.

GrosQuildu commented 1 year ago

Hm, I would consider that example a valid finding - if a method returned error and variable, then the variable is probably not meant to be used. It should be reviewed manually to confirm that returning variable instead of nil is fine.

risto-liftoff commented 1 year ago

All right, I've removed my last commit.

GrosQuildu commented 1 year ago

Thanks, we are almost good to merge. Only signing the CLA is left, please do it when you have time. https://cla-assistant.io/trailofbits/semgrep-rules?pullRequest=33

risto-liftoff commented 1 year ago

Yep, I'll sign the CLA once my legal department gives me the thumbs up to do so.

0xDC0DE commented 1 year ago

Hi!

Could you change the hardcoded variable name for err in the rule to a metavariable? e.g. $ERR . While it is convention to call this variable err, it can still have a different name, and the current version of the rule would not flag those cases.

GrosQuildu commented 1 year ago

Good catch. Actually we could try to support more than 2 return values, if that is possible with Semgrep.

risto-liftoff commented 1 year ago

All right, I've made the requested changes:

Use $ERR instead of err
Support more than 2 return values

hex0punk commented 1 year ago

IMO the real issue that this rule is looking for should be potential nil derefences. Otherwise, most of the results from this rule will be FPs. Unless you could use metavariable-type and specify whitelisted types that cannot have nil such as Int and String (not possible at the moment), then I'd argue that is best to only look for cases where a likely nil object is deferenced. So I suggest to ToB also updating this:

    - pattern: |
        ..., $X, ..., $ERR = ...
        if $ERR != nil {
          ...
          <... $X ...>
        }

to

    - pattern: |
        ..., $X, ..., $ERR = ...
        if $ERR != nil {
          ...
          <... $X.$Y ...>
        }

This would greatly reduce FPs and give you likely cases where a nil dereference could occur.

risto-liftoff commented 1 year ago

    - pattern: |
        ..., $X, ..., $ERR = ...
        if $ERR != nil {
          ...
          <... $X ...>
        }

to

    - pattern: |
        ..., $X, ..., $ERR = ...
        if $ERR != nil {
          ...
          <... $X.$Y ...>
        }

I would favor making this change. When I run this rule as this PR currently has it against our codebase, it returns 97 matches, most of which are false positives (they wouldn't cause a panic). With the change suggested by hex0punk, it returns 3 matches, all of which are true positives.

If ToB agrees this would be a good change, I'll update the PR.

GrosQuildu commented 1 year ago

Lets make the change. While I think that the more noisy rule could help finding business logic bugs, reducing false positives rate is important for automated tooling.

After change, the rule should find cases where assignment to variable (current pattern-not cases) follows the nil dereference call.

hex0punk commented 1 year ago

just checking on the status of this. Is the only thing pending for @risto-liftoff to sign a CLA?

risto-liftoff commented 1 year ago

just checking on the status of this. Is the only thing pending for @risto-liftoff to sign a CLA?

Yep, still waiting on my legal department to approve the CLA. Once that's approved, I believe I'll also need an approval for the PR by a repo maintainer.

GrosQuildu commented 1 year ago

Hey, any updated on the CLA?

risto-liftoff commented 1 year ago

Yes, I finally got word back that I am allowed to sign the CLA.

GrosQuildu commented 1 year ago

Thanks a lot for the contribution! Really appreciated.

For a note: there is one true positive type we are currently missing - will try to work on it in the future.

trailofbits / semgrep-rules

Reduce false positives for invalid-usage-of-modified-variable #33