semgrep / semgrep

Lightweight static analysis for many languages. Find bug variants with patterns that look like source code.
https://semgrep.dev
GNU Lesser General Public License v2.1
10.65k stars 624 forks source link

Support for non-file configuration (and tests, maybe) #10277

Open xmo-odoo opened 5 months ago

xmo-odoo commented 5 months ago

As far as I understand semgrep currently only allows passing configuration (and running tests) via on-disk files. This is not always convenient to run CI, or limit container access, or keep configurations synchronised between different repositories e.g. might require writeable FS to write the configuration to in order to make it accessible.

As far as I can tell, semgrep currently does not use stdin, maybe -c - would be used to read the configuration file from there, providing more flexibility to the caller? Some shells to support tempfile process substitution, but AFAIK it's not universal, and requires using one of those shells as well.

It would also be nice to support bespoke FDs for --test instead of just directories. Probably less of a concern though. e.g.

$ mkfifo f
$ semgrep -c ... --test fd:42 42<f &
$ echo $CONTENT > f

or something along those lines (likely the equivalent in an actual language pulling the content from remote storage or a database or something)

mjambon commented 5 months ago

Hi!

  1. I understand that the -c/--config option doesn't support reading from named pipes or stdin (which could be provided as /dev/stdin if not with -). It's something we could add. Below are examples of what works and doesn't work today.
  2. I don't understand your other request concerning --test. What content would be available for reading from a pipe since normally a test folder provides pairs of files (YAML rule, annotated target file)?

semgrep supports reading target files from named pipes, including those created by Bash with process substitution:

$ semgrep -l python -e hello <(echo 'hello')

┌────────────────┐
│ 1 Code Finding │
└────────────────┘

    /tmp/tmp47gtd547/dev_fd_63
            1┆ hello

┌──────────────┐
│ Scan Summary │
└──────────────┘

Ran 1 rule on 1 file: 1 finding.

... but it's not supported for rule files (-c):

$ semgrep -c <(echo whatever) hello.py
[ERROR] config location `/dev/fd/63` is not a file or folder!
[ERROR] invalid configuration file found (1 configs were invalid)
xmo-odoo commented 5 months ago

I don't understand your other request concerning --test. What content would be available for reading from a pipe since normally a test folder provides pairs of files (YAML rule, annotated target file)?

Test files, with the rules provided by -c? Obviously that would not support fixing tests (probably).

mjambon commented 5 months ago

Sorry, I'm not a heavy user of the test feature. If I understand correctly, you would like something like this:

semgrep -c rules.yaml --test target.py

where rules.yaml contains a number of rules, and target.py contains annotations for a specific rule or maybe even for several rules:

# ruleid: my-first-rule
some code

# ruleid: some-other-rule
blah

# ok: my-first-rule
more code

with target.py being possibly a named pipe or - for stdin.

Feel free to add corrections or details for this feature request for whomever will end up implementing it.

xmo-odoo commented 5 months ago

Sounds about right, though I was thinking more bespoke fds for tests: it's a bit odd at first glance but works nicely to transmit multiple, separate, data streams from the parent process without needing bespoke framing.

I figure stdin / - would be more useful for the configuration itself.

The use case would be programmatic invocation of semgrep by checker tools and the like, which might want to avoid filesystem storage (and for which writeable FS might be undesirable, though I don't know how if semgrep supports that anyway).