semgrep / pfff

pfff is mainly an OCaml API to write static analysis, dynamic analysis, code visualizations, code navigations, or style-preserving source-to-source transformations such as refactorings on source code.
https://semgrep.dev
Other
186 stars 29 forks source link

Scala fix import parse2 #541

Closed brandonspark closed 2 years ago

brandonspark commented 2 years ago

What: Scala import parsing is currently broken, and both does not conform to the specification, and does not handle metavariables properly. This manifests in Semgrep patterns such as import $X being rejected as invalid patterns at parse time.

Why: We should fix this because the above import pattern is useful to be able to do, particularly when $X is a metavariable regex or something.

How: Via scala-lang, we obtain:

Screen Shot 2022-07-19 at 10 38 23 PM

Unrolling this once, we see that we get something like:

StableId ::= id
           | StableId '.' `id`
           | [id '.'] `this` `.` id
           | [id '.'] `super` [ClassQualifier] '.' id

Basically, a StableId starts as either a single id, or a id.this.id2, or a id.super [class qualifier], and then sees some number of identifiers. Notably, because we have:

Screen Shot 2022-07-19 at 10 43 17 PM

in the case of an import, it must be at least one.

This means that something like import X is not actually syntactically permissible Scala. This is why, currently, import $X fails. But this is still something that would be nice to be able to do, so I went ahead and changed things so that we can support it.

Essentially, I've made it such that import_exprs are either what they normally are understood to be, or they can be a metavariable. This makes it so that we don't parse things like import x, but we can parse import $X.

The previous code also permitted non-dots following an identifier, and I don't think that should actually be possible.

Test plan: So, ideally I'd like to write a ton of tests showing that things which do not conform to the Scala grammar do not parse. Unfortunately, the parsing tests seem to be more about what things do parse, so I don't actually know how to do this. Feedback appreciated.

In the meantime, the test plan can be to try testing dumping patterns on the following .sgrep patterns. I've verified the following using sc -dump_pattern test.sgrep -lang scala: example status
import this.a FAILS
import this.a._ SUCCEEDS
import $X SUCCEEDS
import $X._ SUCCEEDS
import a.$X._ SUCCEEDS
import a FAILS
import a.this.b FAILS
import a.this.b._ SUCCEEDS

Not sure how to carve this in stone, so to speak, however. Let me know what you think.

Security

Security

aryx commented 2 years ago

what is the semgrep issue related to this? I guess https://github.com/returntocorp/semgrep/issues/5219

aryx commented 2 years ago

Please assign yourself to the github issue if you work on it (I know there's also Linear, but if it's a public github issue it's nice to tell to the other you're working on it).

aryx commented 2 years ago

Excellent What/Why/How. I actually never managed to understand this Path/StableId and unrolling Path just makes things so much clearer :)

Regarding rejecting bad code, you can imitate what we do for c++ tests:

    "rejecting bad code", (fun () ->
      let dir = Config_pfff.tests_path "cpp/parsing_errors" in
      let files = Common2.glob (spf "%s/*.cpp" dir) in
      files |> List.iter (fun file ->
        try
          let _ast = parse file in
          Alcotest.failf "it should have thrown a Parse_error %s" file
        with
        | Parse_info.Parsing_error _ -> ()
        | exn -> Alcotest.failf "throwing wrong exn %s on %s"
                   (Common.exn_to_s exn) file
      )
    );

in lang_cpp/parsing/unit_parsing_cpp.ml

aryx commented 2 years ago

but yes usually we don't care too much about rejecting code; semgrep is not a type checker, or parsing checker; people should use other tools for that.