Closed brandonspark closed 2 years ago
what is the semgrep issue related to this? I guess https://github.com/returntocorp/semgrep/issues/5219
Please assign yourself to the github issue if you work on it (I know there's also Linear, but if it's a public github issue it's nice to tell to the other you're working on it).
Excellent What/Why/How. I actually never managed to understand this Path/StableId and unrolling Path just makes things so much clearer :)
Regarding rejecting bad code, you can imitate what we do for c++ tests:
"rejecting bad code", (fun () ->
let dir = Config_pfff.tests_path "cpp/parsing_errors" in
let files = Common2.glob (spf "%s/*.cpp" dir) in
files |> List.iter (fun file ->
try
let _ast = parse file in
Alcotest.failf "it should have thrown a Parse_error %s" file
with
| Parse_info.Parsing_error _ -> ()
| exn -> Alcotest.failf "throwing wrong exn %s on %s"
(Common.exn_to_s exn) file
)
);
in lang_cpp/parsing/unit_parsing_cpp.ml
but yes usually we don't care too much about rejecting code; semgrep is not a type checker, or parsing checker; people should use other tools for that.
What: Scala
import
parsing is currently broken, and both does not conform to the specification, and does not handle metavariables properly. This manifests in Semgrep patterns such asimport $X
being rejected as invalid patterns at parse time.Why: We should fix this because the above import pattern is useful to be able to do, particularly when
$X
is a metavariable regex or something.How: Via scala-lang, we obtain:
Unrolling this once, we see that we get something like:
Basically, a
StableId
starts as either a singleid
, or aid.this.id2
, or aid.super [class qualifier]
, and then sees some number of identifiers. Notably, because we have:in the case of an
import
, it must be at least one.This means that something like
import X
is not actually syntactically permissible Scala. This is why, currently,import $X
fails. But this is still something that would be nice to be able to do, so I went ahead and changed things so that we can support it.Essentially, I've made it such that
import_expr
s are either what they normally are understood to be, or they can be a metavariable. This makes it so that we don't parse things likeimport x
, but we can parseimport $X
.The previous code also permitted non-dots following an identifier, and I don't think that should actually be possible.
Test plan: So, ideally I'd like to write a ton of tests showing that things which do not conform to the Scala grammar do not parse. Unfortunately, the parsing tests seem to be more about what things do parse, so I don't actually know how to do this. Feedback appreciated.
.sgrep
patterns. I've verified the following usingsc -dump_pattern test.sgrep -lang scala
:import this.a
import this.a._
import $X
import $X._
import a.$X._
import a
import a.this.b
import a.this.b._
Not sure how to carve this in stone, so to speak, however. Let me know what you think.
Security
Security