Open khanhldt opened 2 years ago
This issue is synced in Linear at https://linear.app/r2c/issue/PA-823/semgrep-tries-to-pull-registry-when-validate-is-on.
cc @brendongo @underyx
semgrep pulls from registry to pull the extra validation rules https://github.com/returntocorp/semgrep/blob/develop/semgrep/semgrep/core_runner.py#L454
Maybe we need to update documentation. @khanhldt where did you find that --validate
says it doesn't pull rules from registry?
I think I read it in a couple of places, but one is from the semgrep --help
saying This will check YAML files for errors and run 'p/semgrep-rule-lints' on the YAML files. No search is performed.
Another thing is that I've been using version 0.65.0 in our CI and its been working fine. With this change in newer versions, I'm not able to get Semgrep to run in our CI anymore.
How can I turn off network calls from Semgrep?
As far as I understand the use case, I suspect it's semi-related to https://github.com/returntocorp/semgrep/issues/3147
E.g., if we bundled p/semgrep-rule-lints
with release, --validate
would be robust to network requirements.
@khanhldt is this blocking you? can you remove the --validate step for now?
Not really blocking as I switched to use version 0.70.0.
I'm using this hook to validate my Semgrep rules so --validate
is necessary.
@emjin was there a particular reason we went with not embedding the rule lints or is it fine to proceed with embedding them per release?
Embedding them should be fine
Also, duplicate of https://github.com/returntocorp/semgrep/issues/4454
There are few new features in the new versions that I'd like to use. Would be great if we can have a quick way of unblocking this.
Checking in to see if there's any update on this. Would love to --validate
in network-less environment
💯 ability to run validate without network (or with pre-downloaded version of semgrep-rule-lints file) strongly desired.
Which validations are done in --validate
that are not done by simply running a search with the rule / loading the rule?
Is https://github.com/semgrep/semgrep/blob/6fbd7aebf45410a4d3d8167ab63b9d3e8e4c0989/src/osemgrep/cli_scan/Validate_subcommand.ml#L11-L22 still accurate?
If so, which are NOT done by a search command? First one (YAML parser) for sure is. And I believe schema validation as well.
But is that semgrep-core ... --check_rules
also called?
The reason why --validate
pulls rules from the registry is because we have wanted to be able to dynamically update the validation rules without waiting for a release. At the moment, we don't plan to change that.
@emjin and that makes sense to me, just there should be an option not to, otherwise we cannot use validate in a container/VM without public internet access (common in many CIs)...
But if:
semgrep --validate -f my-rule.yaml
Runs the same checks done by:
semgrep -f my-rules.yaml bogys_file.txt
semgrep -f p/semgrep-rule-lints my-rules.yaml
I'm happy to not use --validate
at all, as the validation (except lints) would be done already by the test scans performed with the rules
But I'm unable to be sure of that by looking at the code, could you confirm?
Thanks
There is one check that --validate
runs which semgrep -f p/semgrep-rule-lints
would not (it checks for metavariables that referenced out of scope). However, we have had to relax that check as we changed how metavariables behave, so at this point I think the p/semgrep-rule-lints
run is the real value.
Awesome, thank you @emjin !
Is there any way to call that check alone at the moment to cover future versions? Is it sempre-core --check-rules?
Describe the bug Trying to set up a pre-commit hook like below:
It failed to run in our CI because Semgrep attempted to pull from registry even though the documentation said it should not (and our CI blocks network call).
So i tried to remove
--validate
flag and it looks like it doesn't try to pull from registry anymore. Also, looking at the change log, i believe this was changed from 0.71.0 (here).To Reproduce
Get semgrep version 0.80.0 (i believe any version after 0.70.0 has this issue) and run this command:
You will see that semgrep tries to pull from the registry.
Expected behavior
There must be a way to configure Semgrep to not reach out to network while running. Either by the existing flags or a new flag.
Screenshots
(let me know if you need)
What is the priority of the bug to you?
Environment
local buid
Use case
Running Semgrep in network-restricted environment like CI.