Open kiwiz opened 1 year ago
This issue is being marked stale
because there hasn't been any activity in 14 days and either it wasn't prioritized or its priority is high. Please apply the appropriate priority:*
label before removing the stale
label.
Stale-bot has closed this stale item. Please reopen it if this is in error.
Could this be reopened (if it is deemed useful to discuss)?
@emjin :wave: Could this issue be reopened?
This issue is being marked stale
because there hasn't been any activity in 14 days and either it wasn't prioritized or its priority is high. Please apply the appropriate priority:*
label before removing the stale
label.
Stale-bot has closed this stale item. Please reopen it if this is in error.
cc @aryx @mjambon perhaps this can be addressed in the osemgrep porta
cc @mjambon. Maybe this is similar to the issue we currently experiencing in osemgrep. Wonder if improving Gitignore would also improve this use case.
This is an interesting problem that is now critical since the new semgrepignore mechanism in osemgrep no longer relies on the optimizations used by git. Presumably, git (git ls-files
) relies on an index of all the files under version control to quickly produce a list, rather than consulting the whole file tree and gitignore filters. git status
also is pretty fast despite having to scan the file tree for new files. This is compatible with the performance bottleneck being checking many files against many glob patterns. I haven't thought about this issue deeper than this yet.
Possible families of solutions:
Is your feature request related to a problem? Please describe.
I'm running Semgrep within a large (~98K files) monorepo environment. When scanning individual projects, I can pass them to semgrep as targets. However, (for coverage reasons) I'd also like to be able to scan all files that are not part of a project. The way I'm implementing this currently is via a large (~850) number of
--exclude
arguments. This is really slow! I've done some profiling, and the bulk of the execution time happens inTargetManager.globfilter
. I've made some minor optimizations on my fork (which I'll open up a PR for once I've fully tested). However, I'd also like to discuss alternatives/additions to the include/exclude functionality.Describe the solution you'd like
Perhaps something like a
--include-dir
/--exclude-dir
that does exact prefix matches?Describe alternatives you've considered
regexp
objects generated bywcmatch
--exclude
arguments 😄Use case What will this feature enable for you?
The (somewhat niche) usecase of being able to specify a large number of exclude directives.
Additional context
Relates to: