oss-review-toolkit / ort

A suite of tools to automate software compliance checks.
https://oss-review-toolkit.org
Apache License 2.0
1.56k stars 306 forks source link

Automatic VCS path detection #5484

Open fviernau opened 2 years ago

fviernau commented 2 years ago

Depending on the package manager it may be possible to better set the VCS path so that scan scope is narrowed down a bit. For example:

  1. GoMod: The source tree of a module is guaranteed to be root at the directory where the go.mod file resides in. So, one could find that directory and set the VCS path without risk
  2. Maven: For a repository containing multiple Maven packages, the directory with the same name as the package containing pom.xml is often the right VCS path, see e.g. https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-accessanalyzer/pom.xml
  3. npm/yarn: For a repository containing multiple Node packages, the directory with the same name as the package containing package.json is often the right VCS path, see e.g. https://github.com/heremaps/harp.gl/tree/master/%40here.

...TBC

This ticket is not yet an implementation ticket, but for now for figuring out whether and how we could solve that.

tsteenbe commented 2 years ago

@fviernau I would word the above slightly differently

  1. Maven: For a repository containing multiple Maven packages, the directory with the same name as the package containing pom.xml is often the right VCS path, see e.g. https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-accessanalyzer/pom.xml
  2. npm/yarn: For a repository containing multiple Node packages, the directory with the same name as the package containing package.json is often the right VCS path, see e.g. https://github.com/heremaps/harp.gl/tree/master/%40here.
mnonnenmacher commented 2 years ago

A good place to implement such logic could be the PackageProvenanceResolver:

A possible risk if this is implemented outside the analyzer is that it needs to be verified that all code relying on the VCS path of the package is made aware that the scan result might use a different path.