semgrep / semgrep

Lightweight static analysis for many languages. Find bug variants with patterns that look like source code.
https://semgrep.dev
GNU Lesser General Public License v2.1
10.65k stars 624 forks source link

String literal metavariables don't work with f-strings #7040

Open rc-mattschwager opened 1 year ago

rc-mattschwager commented 1 year ago

Describe the bug A clear and concise description of what the bug is.

String literal metavariables work with regular strings and byte strings in Python:

$ echo '"{foo} and {bar} and {baz}"' | semgrep -l python -e '"$STR"' -
Scanning 1 file.

Findings:

  /var/folders/5_/pjcwsmbn4ll0_yy6h8b7z7p40000gq/T/tmp9bwjbpyy/stdin 
          1┆ "{foo} and {bar} and {baz}"

Ran 1 rule on 1 file: 1 finding.
$ echo 'b"{foo} and {bar} and {baz}"' | semgrep -l python -e 'b"$STR"' -
Scanning 1 file.

Findings:

  /var/folders/5_/pjcwsmbn4ll0_yy6h8b7z7p40000gq/T/tmpwg8p34tz/stdin 
          1┆ b"{foo} and {bar} and {baz}"

Ran 1 rule on 1 file: 1 finding.

But they do not appear to work with f-strings:

$ echo 'f"{foo} and {bar} and {baz}"' | semgrep -l python -e 'f"$STR"' -
Scanning 1 file.

Ran 1 rule on 1 file: 0 findings.

To Reproduce Steps to reproduce the behavior, ideally a link to https://semgrep.dev:

See above.

Expected behavior A clear and concise description of what you expected to happen.

f-strings work just as well as regular strings and byte strings.

Screenshots If applicable, add screenshots to help explain your problem.

What is the priority of the bug to you?

Environment If not using semgrep.dev: are you running off docker, an official binary, a local build?

$ semgrep --version
1.8.0

Use case What will fixing this bug enable for you?

IagoAbal commented 1 year ago

Need to think about this, "$X" is only meant to match string literals (i.e., constants), and f"{foo} and {bar} and {baz}" does not need to be a constant. cc @aryx

IagoAbal commented 1 year ago

You could achieve what you intended like this: https://semgrep.dev/s/lXqD.

rc-mattschwager commented 1 year ago

Need to think about this, "$X" is only meant to match string literals (i.e., constants), and f"{foo} and {bar} and {baz}" does not need to be a constant.

Hmm, I thought "..." was for string literals. Maybe I'm getting things backwards:

$ echo 'f"{foo} and {bar} and {baz}"' | semgrep -l python -e 'f"..."' -
Scanning 1 file.

Findings:

  /var/folders/5_/pjcwsmbn4ll0_yy6h8b7z7p40000gq/T/tmpdgbq6xe1/stdin 
          1┆ f"{foo} and {bar} and {baz}"
          ⋮┆----------------------------------------
          1┆ f"{foo} and {bar} and {baz}"
          ⋮┆----------------------------------------
          1┆ f"{foo} and {bar} and {baz}"
          ⋮┆----------------------------------------
          1┆ f"{foo} and {bar} and {baz}"

Ran 1 rule on 1 file: 4 findings.

I'm having a hard time remembering my original use case here. I think it was to find all f-strings (and only f-strings) and search for some content within them. pattern-regex: f\"([^\"]*)" could work, but regex's are slow and error-prone (what about ', what about escaped " in the string, etc etc).

f-strings are a weird case. I guess most other languages just do string interpolation without any special identifier. f-strings are like string literals, but also not string literals 😅

IagoAbal commented 1 year ago

Initially we only had "..." for string literals (or compile-time constants in general). Then we added "$X" that is meant to match the same as "..." (so also compile-time constants) but binding the content of the string (what is inside the quotes) to $X as "raw text" (rather than an AST element). The "$X" pattern is useful in combination with metavariable-regex and metavariable-pattern for example.

But f-strings are special also in Semgrep.. You can match a constant f-string with "...". But you can match any f-string with f"...". An f-string gets interpreted as a concatenation expression internally, and f"..." is like a ... inside the f-string. So maybe what could make sense is that f"$X" acts like f"$...X" currently does.

I didn't realize at first that f"$...X" worked, I guess it's better than the workaround I initially proposed.