package-url / purl-spec

A minimal specification for purl aka. a package "mostly universal" URL, join the discussion at https://gitter.im/package-url/Lobby
https://github.com/package-url/purl-spec
Other
644 stars 151 forks source link

Should we support a leading v in golang packages? #294

Open TG1999 opened 4 months ago

TG1999 commented 4 months ago

If we look at go packages like these https://github.com/go-jose/go-jose/archive/refs/tags/v4.0.1.zip https://pkg.go.dev/github.com/go-jose/go-jose/v3?tab=versions, they have a leading v in them. Whereas if we look in osv.dev they are stored without any leading v https://osv.dev/vulnerability/GHSA-c5q2-7r4c-mv6g.

So how should we store this as a purl ?

pkg:golang/github.com/go-jose/go-jose/v4@v4.0.1 or

pkg:golang/github.com/go-jose/go-jose/v4@4.0.1 ?

matt-phylum commented 4 months ago

I think the v is part of the version in Go and needs to be present. https://github.com/golang/go/issues/32945 If purls were written without the v and then Go started doing something different, it would break all purl implementations.

pombredanne commented 4 months ago

@TG1999 @matt-phylum go is a mess in this domain. :smiling_imp:

I'm inclined to accept the versions as they are with their v prefix, but then these are not the semver versions that go moduled promised anymore short of stripping the leading v.

So we need to agree on a canonical way (and document this preferred canonical way in the types doc) and also accept that unfortunately tools will have to deal with prefixed and unprefixed versions, and will need to strip the prefix to compare version properly in all cases and alo query some databases.

This is done here for instance https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#golang

matt-phylum commented 4 months ago

The Go section of the spec is in dire need of updates. The version and subpath stuff there implies that it's talking about Go packages and Go modules aren't supported. However, if the version specification were relaxed to a Git reference instead of a commit ID (truncated to an unspecified length), then the v must be included because the v is part of the tag name and is significant to Git. Commit IDs and other Git references are not versions and cannot be compared, but in Go if a tag begins with v and contains a valid version number then it is a comparable version.

It looks like since Go started using modules, the examples need to be updated:

Proper examples with versions:

prabhu commented 3 months ago

Another confusion for go: Is it all a name or does it have a namespace confusion for go

matt-phylum commented 3 months ago

The spec is clear that Go packages have PURL namespaces, even if the concept does not exist in Go. What's missing is that Go packages only sometimes have PURL namespaces because not all Go package IDs contain slashes.

matt-phylum commented 3 months ago

I guess the problem with Go (and NPM) packages is that even if your PURL implementation is correct, it's up to the application to correctly handle this namespace/name split and join translation, and users are unlikely to read the spec when they have the library to handle that for them. Maybe slashes in the names of Go packages should be forbidden to stop users from unknowingly doing the wrong thing and because of the way the names work it shouldn't be possible for slashes in the name to get some other meaning where they would need to be accepted later.

prabhu commented 3 months ago

@matt-phylum, we should fix the purl spec for go IMHO. Go was the only team with some reservations during the last IETF submission if my memory is correct.

matt-phylum commented 3 months ago

204 might be the way to go. Combine the namespace and name into one value at the PURL level, don't encode slashes¹, and leave it up to the package type how to interpret it.

Go would change from [0,n-1) of the Go package ID split by / in the PURL namespace and segment n-1 in the PURL name, to the PURL name and the Go name being equal. Ergonomics are improved because the user no longer needs to split and join.

NPM would change from the NPM namespace in the PURL namespace and the NPM name in the PURL name to the full package name in the PURL name. NPM does have namespaces, but most of the time you don't need to be aware of them and just use the full package name, and it would be possible to do the same with PURL. Ergonomics are improved because the user no longer needs to split and join.

Maven would change from the Maven group ID in the PURL namespace and the Maven artifact ID in the PURL name to "/" in the PURL name. In this case, the ergonomics are worse because the splitting and joining is left up to the user. Maven tools don't typically specify packages this way.

Rewriting the spec this way shouldn't change the representation of any packages, so even though it would be a breaking API change for libraries, it wouldn't be a breaking change for the ecosystem and we wouldn't need to migrate everything to an incompatible PURL2 or deal with Go PURLs that are full of %2F escapes.

¹ Is this alone okay? For URL, the path segments are tricky. If you use a normal URL parser and ask for the full path of the URL, it needs to give you the path without fully percent decoding it in case / vs %2F is a meaningful distinction (eg it's a route parameter character, not a path segment delimiter). Separating the segments is supposed to happen before decoding. For PURL, as long as none of the existing package types have valid packages where the current name-without-namespace field is expected to contain a slash, and we don't expect package types to add such a requirement later, it should be safe for the library to return a single decoded name string.