semgrep / pfff

pfff is mainly an OCaml API to write static analysis, dynamic analysis, code visualizations, code navigations, or style-preserving source-to-source transformations such as refactorings on source code.
https://semgrep.dev
Other
186 stars 29 forks source link

fix scala parsing for imports #540

Closed brandonspark closed 2 years ago

brandonspark commented 2 years ago

What: Scala import parsing is currently broken, and both does not conform to the specification, and does not handle metavariables properly. This manifests in Semgrep patterns such as import $X being rejected as invalid patterns at parse time.

Why: We should fix this because the above import pattern is useful to be able to do, particularly when $X is a metavariable regex or something.

How: Via scala-lang, we obtain:

Screen Shot 2022-07-19 at 10 38 23 PM

Unrolling this once, we see that we get something like:

StableId ::= id
           | StableId '.' `id`
           | [id '.'] `this` `.` id
           | [id '.'] `super` [ClassQualifier] '.' id

Basically, a StableId starts as either a single id, or a id.this.id2, or a id.super [class qualifier], and then sees some number of identifiers. Notably, because we have:

Screen Shot 2022-07-19 at 10 43 17 PM

in the case of an import, it must be at least one.

This means that something like import X is not actually syntactically permissible Scala. This is why, currently, import $X fails. But this is still something that would be nice to be able to do, so I went ahead and changed things so that we can support it.

Essentially, I've made it such that import_exprs are either what they normally are understood to be, or they can be a metavariable. This makes it so that we don't parse things like import x, but we can parse import $X.

The previous code also permitted non-dots following an identifier, and I don't think that should actually be possible.

Fixes #5219.

Test plan: So, ideally I'd like to write a ton of tests showing that things which do not conform to the Scala grammar do not parse. Unfortunately, the parsing tests seem to be more about what things do parse, so I don't actually know how to do this. Feedback appreciated.

In the meantime, the test plan can be to try testing dumping patterns on the following .sgrep patterns. I've verified the following using sc -dump_pattern test.sgrep -lang scala: example status
import this.a FAILS
import this.a._ SUCCEEDS
import $X SUCCEEDS
import $X._ SUCCEEDS
import a.$X._ SUCCEEDS
import a FAILS
import a.this.b FAILS
import a.this.b._ SUCCEEDS

Not sure how to carve this in stone, so to speak, however. Let me know what you think.

Security

brandonspark commented 2 years ago

Messed this up: see other PR