Open jkowalleck opened 3 months ago
I don't think this is right either. Go module names are case sensitive, including the first path element. Uppercase characters are forbidden for the first path element, so lowercasing is unnecessary for valid module names and can turn invalid module names into valid module names. It's easier and more accurate to just leave the module name how it is.
$ go get packages.EXAMPLE.com/MyOrg/foO
go: malformed module path "packages.EXAMPLE.com/MyOrg/foO": invalid char 'E' in first path element
This closely mimics the capabilities supported for go module names. Since this is a breaking change, a new package type such as go
, gopkg
, or gomod
is preferable to versioning the purl spec.
Isn't Option C the same as Option B?
Related to namespaces: https://github.com/package-url/purl-spec/issues/294
It's not possible to just make a new go package type to avoid versioning PURL. This would create distinct PURLs for both go and golang which refer to the same package, but only in certain contexts, and likely lead to unexpected and inconsistent normalization where some software translates golang PURLs into go PURLs and other software considers the normalized PURL to be a distinct package. This is a problem especially for software that tries to match PURLs from different sources.
@matt-phylum, option B has a detailed specification about the name property. My proposal is not to have any opinion.
purl doesn't have a concept of versioning built-in, so needs both the producer and consumer agree on the exact version to follow. Distinct package types avoids this problem. The benefit is that it can be used as a precendent to improve npm, pypi etc.
Creating distinct package types avoids one problem by creating a bigger problem. Creating several slightly different types with slightly different behaviors defeats the purpose of having a standardized way of naming packages. If all possible type variations are valid simultaneously, all implementations need to support all ways to refer to packages relevant to that implementation. Existing software would not understand the new types until updated (similar to versioning PURL?). Humans working with PURLs will need to remember which rules apply to which types. Normalizing from one type to a preferred type to alleviate this issue would be a significant change to normalization that would cause issues with interoperability between products and compatibility with existing data.
I think removing the namespace for Go or other package types and instead putting a percent encoded path into the name, whether with a new type or a new version, would be a disaster because it would break compatibility with almost all existing Go PURLs and PURL implementations. There's no standard format for deconstructed PURLs so it's safe to change the spec so Go packages do not have namespaces as long as the path is used without percent encoding, resulting in the same serialized representation. It'd probably be best to do this across all package types at the same time so PURL implementations can be simplified by combining the two components instead of having an extra case for namespace+name combined.
re: https://github.com/package-url/purl-spec/issues/308#issuecomment-2186824843
[...] There's no standard format for deconstructed PURLs
Oh there is. see https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst
[...] so it's safe to change the spec so Go packages do not have namespaces as long as the path is used without percent encoding [...]
this would be against existing purl spec.
if all is a name
, then the encoding is mandatory according to PURL spec.
Existing PURL spec: name
MUST NOT include a /
--> it is to be URL-encoded to %2F
Existing PURL spec: namespace
can have as many /
as they want ...
thing is: AFAIK go
does not know any namespace (unlike php/composer
and npm
and others ...)
go
only has package names, and I had the idea to use this fact to make things right.
[...] with a new type or a new version, would be a disaster because it would break compatibility with almost all existing Go PURLs [...]
you are completely wrong here. The opposite is the case:
the namespace-segments and name are to be escaped per PURL spec - regardless of new or old go PURL. nothing changes here.
and to distinguish between new and old ... well the one has atleast one namespace-segment, the other does not.
and downstream usage example:
OLD: take the namespace-segments and the name -> concatenate both with /
--> you've got the package dist url ...
NEW: take the namespace-segments(an empty list) and the name -> concatenate both with /
--> you've got the package dist url ...
Seams not a big of a deal.
I just wanted to give ideas how this could be solved and how hard it might be. I don't care for a specific solution. Furthermore, I don't even use go myself. I don't have any investment here.
Anyway, I do not want to alter the core PURL spec. All it takes is "fixing" the type spec.
thing is: AFAIK
go
does not know any namespace (unlike php/composer
andnpm
and others ...)go
only has package names, and I had the idea to use this fact to make things right.
PHP nor NPM have namespaces either. The name of symfony/console
is symfony/console
. The name of @angular/cli
is @angular/cli
. The native tools always name the dependency with its leading component.
PHP nor NPM have namespaces either. The name of
symfony/console
issymfony/console
. The name of@angular/cli
is@angular/cli
. The native tools always name the dependency with its leading component.
you are wrong here.
symfony
is the vendor (doubles as namespace), console
is the name.@angular
is the scope (doubles as namespace) cli
is the name.but all of this does not matter for this discussion here, sorry, please stick to the topic.
go
, afaik, does not have a registry, so they dont have a vendor, nor scope, nor namespaces.
they have locations.
There's no difference between how NPM and PHP do/don't have namespaces and how Go does/doesn't have namespaces. In all of these cases, the name of the package in the native ecosystem contains slashes, and for PURL the native name is pulled apart into a namespace+name combination that results in the serialized form containing the native name.
Creating distinct package types avoids one problem by creating a bigger problem. Creating several slightly different types with slightly different behaviors defeats the purpose of having a standardized way of naming packages. If all possible type variations are valid simultaneously, all implementations need to support all ways to refer to packages relevant to that implementation. Existing software would not understand the new types until updated (similar to versioning PURL?). Humans working with PURLs will need to remember which rules apply to which types. Normalizing from one type to a preferred type to alleviate this issue would be a significant change to normalization that would cause issues with interoperability between products and compatibility with existing data.
I think removing the namespace for Go or other package types and instead putting a percent encoded path into the name, whether with a new type or a new version, would be a disaster because it would break compatibility with almost all existing Go PURLs and PURL implementations. There's no standard format for deconstructed PURLs so it's safe to change the spec so Go packages do not have namespaces as long as the path is used without percent encoding, resulting in the same serialized representation. It'd probably be best to do this across all package types at the same time so PURL implementations can be simplified by combining the two components instead of having an extra case for namespace+name combined.
We already have this problem. For example, nixos can wrap a pypi package and build it slightly differently and have a similar package name that may or may not have the same vulnerabilities. Many OS distros also operate similarly.
PHP, of course, has vendor.
I don't see package namespaces in the screenshot.
The first arrow looks like it's pointing at "main Composer repository", but the repository is not related to the package name.
The other two arrows are pointing at package names.
As you can see, require takes an object that maps package names (e.g.
monolog/monolog
) to version constraints (e.g.1.0.*
).
https://getcomposer.org/doc/01-basic-usage.md#the-require-key
The package name consists of a vendor name and the project's name.
https://getcomposer.org/doc/01-basic-usage.md#package-names
Some package types do have namespaces.
composer, docker, golang, huggingface, npm, swift create a PURL namespace by splitting the native package name/id on the last slash such that writing out the PURL in its canonical form gives the appearance of PURL using the native package name/id, despite PURL actually forcing a namespace+name.
nuget is actually similar to npm, but handled differently by PURL. NuGet packages usually have a name prefix, but NuGet uses periods as delimiters, and pkg:nuget/microsoft/extensions/dependencyinjection
or pkg:nuget/microsoft.extensions/dependencyinjection
look alien to NuGet users. Replacing the periods with slashes to fit into PURL when leaving them alone would work is unnecessary.
There are a few more I'm not sure about, but the rest forbid namespaces.
I think it would be a mistake to create a package type which normally puts slashes in its PURL name because it makes PURLs that are difficult for humans and it creates complications if namespaces are removed from the core specification (possible without breaking existing PURLs).
reminder: this is about the current golang
PURL-TYPE.
this is not a general discussion about general ideas and likings. this is about solving an issue the go
community is actually having right now, not some esoteric concepts of "humans reading PURL" or personal preference, nor about "but the other PURL-TYPEs do it this way...".
Each ecosystem has own requirements, each ecosystem is facing different standards and constraints.
And the current implementation for golang
in PURL-TYPE does not adhere to the real world of go
users.
Please read the original issue description and be helpful. :D
@jkowalleck, we are seeing similar issues and potential workarounds across other package types, which is what we are trying to convey here. I think the next step could be for the core maintainers to digest the information and come up with something authoritative.
re: https://github.com/package-url/purl-spec/issues/308#issuecomment-2209342802
I see, but this does not help this particular problem. If there was a larger issue with a wider scope, then this could be discussed in a meta-issue somewhere else, and it might lead to no consensus or a complete reboot of the project. see also https://github.com/package-url/purl-spec/issues/310
In the meantime, this particular issue for go
people could be solved already, ...
PS: nuff said. will unfollow this issue, since i am not really affected as a non-go
person ;-)
see PURL spec : https://github.com/package-url/purl-spec/blob/b33dda1cf4515efa8eabbbe8e9b140950805f845/PURL-SPECIFICATION.rst#rules-for-each-purl-component see PURL-TYPE spec for
golang
: https://github.com/package-url/purl-spec/blob/b33dda1cf4515efa8eabbbe8e9b140950805f845/PURL-TYPES.rst?plain=1#L300-L314Problem
According to PURL-TYPE spec for
golang
, "Thenamespace
and name must be lowercased."This means, that all URL path-part from a hosted go module MUST be lowercased for PURL namespaces. URL path-part are case-sensitive per definition. Therefore, TYPE spec is not helpful, as it modifies URL path-part and renders is usable in namespaces, as it makes them PURLs indistinguishable, and it makes them PURLs unusable for package retrieval.
see also: https://github.com/google/deps.dev/issues/93 see also: https://www.youtube.com/watch?v=Lts4NjHqKIw&t=1004s
Example
Module with the topic of preserving a thing: hosted at
https://example.com/pakages/Preserve
would have a purlpkg:golang/example.com/pakages/preserve
.Module with the topic of an event before serving a thing: hosted at
https://example.com/pakages/preServe
would have a purlpkg:golang/example.com/pakages/preserve
.Issue A: Both PURLs are the same, but the modules are not. Issue B: none of the PURL namespace/name segments are usable to build the original/actual distribution/source URL from it.
Possible Solution
Option A: simply allow case-sensitivity
When converting URL to PURL namespace, then the host-part of the URL name MUST be lowercased, and the path-part of the URL segments MUST NOT be modified.
Example:
https://packages.EXAMPLE.com/MyOrg/foO
--> PURLpkg:golang/packages.example.com/MyOrg/foO
https://packages.example.com/ACME/foo
--> PURLpkg:golang/packages.example.com/ACME/foo
In case the proposed solution above is considered a breaking change: deprecate the existing PURL-TYPE
golang
and create a new PURL-TYPEgo
(see #67), and define the PURL TYPE as proposed above.Option B: no namespaces, all encoded name
this would be definitely a breaking change, so it requires deprecating TYPE
purl
, and come up with a reboot:go
(see #67)namespace
must be emptyname
must be the lowercased host-part of the distribution URL followed by the unmodified path-part of the distribution URLsubpath
is used to point to a case-sensitive subpath inside a package.version
is often empty when a commit is not specified and should be the commit in most cases when available.Example:
https://packages.EXAMPLE.com/MyOrg/foO%26bar
--> PURLpkg:golang/packages.example.com%2FMyOrg%2FfoO%2526bar
https://packages.example.com/ACME/foo
--> PURLpkg:golang/packages.example.com%2FACME%2Ffoo
Please bare with me, I am just the person who happened to write this report, I do not know much about the golang ecosystem, but I know something about PURL.