Closed jeffmcaffer closed 1 year ago
@mnonnenmacher I believe what's called "provider" here is what we refer to as "provenance" in our idea.
provenance
is a pretty loaded term that carries with it very deep meaning for some. Seems like here we need the very simple implication that "this is which package foo we are talking about"
I think this information should be added to the RemoteArtifact and VcsInfo models, because these are the two places where we reference URLs (apart from the homepage URL). For RemoteArtifact this could be something like "npmjs.org", "JCenter", or "Maven Central". For VcsInfo "github.com", "bitbucket.com", and so on. My problem is how we should auto-detect the values for those fields, e.g. if we take part of the URL like "github.com" this contradicts the idea of having something URL independent. Maybe we would have to maintain a mapping from URL to provider name?
@mnonnenmacher agreed, having a table that bi-directionally maps provider names to host names makes sense. It likely also makes sense to keep the provider names as generic as possible. For example, we recently ran into some identity problems because some folks were using "npmjs.org" vs "npmjs.com" as the provider. it turns out they are the same and going to npmjs.org forwards to npmjs.com.
To isolate the data from these sorts of variations and changes, using just "npmjs" would be more resilient. That's also inline with your other examples like "maven central" etc. For simplicity perhaps we say that provider names need to be valid url segments that do not require any quoting. (e.g., no spaces, no funny chars, ...) and have them be case insensitive and NOT case preserving. (or just spec lowercase).
This somewhat relates to https://github.com/heremaps/oss-review-toolkit/issues/20.
Packages have a
packageManager
(e.g, npm, maven, ...). Since a given type of package could come from many different places, thePackage
should also talk about aprovider
. Theprovider
should not be the URL of the repository, rather the notional name of the repository (e.g., npmjs.org, github.com). This allows the repos to move and change their URL structure without affecting the identity of the data stored in ORT.