Consider making "skip git" the default in v2

G-Rath commented 1 month ago

(I've been meaning to raise this to discuss for a few months and we're getting closer to v2 so I'm making a public issue to force my own hand 😄)

Currently when I run the scanner without any config it will include scanning the commit if it sees its in a git repository which generally it always will be for me as everything I do that involves code and dependencies uses git for source control, however most of the time these git repositories will not be public because they're proprietary applications meaning there will never a result of scanning the commit and arguably that's a bit of private data being sent to the API (which unlike e.g. the dependency tree, will never give a positive result).

My understanding is that generally there are two main criteria for git scanning to be useful:

the codebase is public (meaning its commits can be indexed, and whatnot)
the codebase is using a very "commit driven" language like C++ (i.e. while possible for other ecosystems to have commits related to vulnerabilities, it's far more likely in just about every ecosystem that version numbers/semver will be being used instead)

Given the number of ecosystems the scanner supports and frankly just how much better/easier non-git based vuln info is to handle right now, I strongly suspect more than half of the uses of the scanner are in a context where at least one of those conditions are not true meaning this current default is not useful for most people and arguably a little negative (though I admit it's not a huge privacy concern).

Personally I'd prefer if this was opt-in (i.e. --check-git or --include-git), though maybe there's another way to try make this more automatic - for example, maybe it would make sense for the scanner to fallback to checking git if it doesn't find any other lockfiles, or to make this managed through a config property so codebases could opt-in (i.e. open source repos could create an osv-scanner.toml with a marker effective telling the scanner "hey this is a public repo so feel free to be more 'aggressive' in what you check")

G-Rath commented 1 month ago

Discussed offline and agreed that we should make it disabled by default

oliverchang commented 1 month ago

This generally makes sense to me that we shouldn't need to scan the git hash of projects generally. However, we do want C/C++ scanning to work out of the box for third party / open source dependencies.

Should we look at still accounting for git repos if they live inside one of these directories by default?

https://github.com/google/osv-scanner/blob/3702c3bbdf0dac9d9c50c4ffa0560f82a0365365/pkg/osvscanner/osvscanner.go#L85

G-Rath commented 1 month ago

~@oliverchang it looks like we're already doing that - if I'm understanding the code right, vendor scanning is about finding git commit shas, and is currently in no way tied to the --skip-git flag~ ok no it looks like the current logic is to look for C/C++ files and determine a hash, so we'd want to extend that logic to also check if there's an initialized .git directory and if so include the sha.

On an aside, that list includes vendor which is a common directory used by bundler/rails, which was another thing I was going to look into after learning about it the other day (as i.e right now on our Rails apps the scanner keeps finding those as an empty directory and then complaining that there's nothing in them), but overall I think it's a lot fair for the scanner to be automatically looking into a directory with that name looking for .git directory

another-rex commented 1 month ago

Agree with not enabling scan the base git repository by default, though I think scanning submodules git hashes, and hashes under vendored lib names is still valuable to enable by default?

google / osv-scanner

Consider making "skip git" the default in v2 #1277