pfff is mainly an OCaml API to write static analysis, dynamic analysis, code visualizations, code navigations, or style-preserving source-to-source transformations such as refactorings on source code.
What:
Scala import parsing is currently broken, and both does not conform to the specification, and does not handle metavariables properly. This manifests in Semgrep patterns such as import $X being rejected as invalid patterns at parse time.
Why:
We should fix this because the above import pattern is useful to be able to do, particularly when $X is a metavariable regex or something.
Unrolling this once, we see that we get something like:
StableId ::= id
| StableId '.' `id`
| [id '.'] `this` `.` id
| [id '.'] `super` [ClassQualifier] '.' id
Basically, a StableId starts as either a single id, or a id.this.id2, or a id.super [class qualifier], and then sees some number of identifiers. Notably, because we have:
in the case of an import, it must be at least one.
This means that something like import X is not actually syntactically permissible Scala. This is why, currently, import $X fails. But this is still something that would be nice to be able to do, so I went ahead and changed things so that we can support it.
Essentially, I've made it such that import_exprs are either what they normally are understood to be, or they can be a metavariable. This makes it so that we don't parse things like import x, but we can parse import $X.
The previous code also permitted non-dots following an identifier, and I don't think that should actually be possible.
Fixes #5219.
Test plan:
So, ideally I'd like to write a ton of tests showing that things which do not conform to the Scala grammar do not parse. Unfortunately, the parsing tests seem to be more about what things do parse, so I don't actually know how to do this. Feedback appreciated.
In the meantime, the test plan can be to try testing dumping patterns on the following .sgrep patterns. I've verified the following using sc -dump_pattern test.sgrep -lang scala:
example
status
import this.a
FAILS
import this.a._
SUCCEEDS
import $X
SUCCEEDS
import $X._
SUCCEEDS
import a.$X._
SUCCEEDS
import a
FAILS
import a.this.b
FAILS
import a.this.b._
SUCCEEDS
Not sure how to carve this in stone, so to speak, however. Let me know what you think.
Security
[X] Change has no security implications (otherwise, ping the security team)
What: Scala
import
parsing is currently broken, and both does not conform to the specification, and does not handle metavariables properly. This manifests in Semgrep patterns such asimport $X
being rejected as invalid patterns at parse time.Why: We should fix this because the above import pattern is useful to be able to do, particularly when
$X
is a metavariable regex or something.How: Via scala-lang, we obtain:
Unrolling this once, we see that we get something like:
Basically, a
StableId
starts as either a singleid
, or aid.this.id2
, or aid.super [class qualifier]
, and then sees some number of identifiers. Notably, because we have:in the case of an
import
, it must be at least one.This means that something like
import X
is not actually syntactically permissible Scala. This is why, currently,import $X
fails. But this is still something that would be nice to be able to do, so I went ahead and changed things so that we can support it.Essentially, I've made it such that
import_expr
s are either what they normally are understood to be, or they can be a metavariable. This makes it so that we don't parse things likeimport x
, but we can parseimport $X
.The previous code also permitted non-dots following an identifier, and I don't think that should actually be possible.
Fixes #5219.
Test plan: So, ideally I'd like to write a ton of tests showing that things which do not conform to the Scala grammar do not parse. Unfortunately, the parsing tests seem to be more about what things do parse, so I don't actually know how to do this. Feedback appreciated.
.sgrep
patterns. I've verified the following usingsc -dump_pattern test.sgrep -lang scala
:import this.a
import this.a._
import $X
import $X._
import a.$X._
import a
import a.this.b
import a.this.b._
Not sure how to carve this in stone, so to speak, however. Let me know what you think.
Security