Python implementation of the package url spec. This project is sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase/ , the Google Summer of Code, nexB and other generous sponsors.
Working with regex syntax is always hard but should not be necessary for most of the simple routes.
For example, a common pattern '[^/]+' in path segment should be abstracted for better readability and new route addition.
Following https://github.com/package-url/packageurl-python/pull/51/files/c1d41a8930b0b89dfc3774b4e18d89de5089e593..7877bb50102482468bdb9b32476d5a6151dc368e#r508692262
Working with regex syntax is always hard but should not be necessary for most of the simple routes. For example, a common pattern
'[^/]+'
in path segment should be abstracted for better readability and new route addition.We could re-use some ideas from the recent Django's URL route system that now replaces the old regex system: https://docs.djangoproject.com/en/3.1/topics/http/urls/#url-dispatcher
This system abstracts the regex complexity into "converters", for example
r'^articles/(?P<year>[0-9]{4})/$'
becomesarticles/<yyyy:year>/
Using a current url2purl example:
pattern = r"https?://raw.githubusercontent.com/(?P<namespace>[^/]+)/(?P<name>[^/]+)/(?P<version>[^/]+)/(?P<subpath>.*)$"
Could become:
route = "https://raw.githubusercontent.com/<str:namespace>/<str:name>/<str:version>/<path:subpath>"
Much easier to write and to read.
Playing around with the Django's _route_to_regex
We could add custom converter for the specific needs of purl https://docs.djangoproject.com/en/3.1/topics/http/urls/#registering-custom-path-converters Some parts like the
(http|https)
will need support as well as the domain section is not part of the Django system: