spdx / license-list-XML

This is the repository for the master files that comprise the SPDX License List
Other
355 stars 288 forks source link

Fix URLs to OSI website #2616

Open ziemek99 opened 1 week ago

ziemek99 commented 1 week ago

Apparently OSI webite was restructured a bit some time ago. Old links to license info return HTTP 301 redirects. While it's not a problem for a browser to follow such link, some kind of parser logic at SPDX erroneously detects such URL as "no longer live" when you see "Other web pages for this license" on SPDX website.

goneall commented 1 week ago

@ziemek99 - Thanks for pointing this out and proposing a solution

Rather than replacing the current OSI reference, I would suggest we add a new reference and keep the original so that tools the use the URL's to correlate license information won't break.

We'll end up with one live and one "not so live" URL. Note, this may appear unnecessary to human users, but for many tools it is quite important to correlate licenses with URLs that are no longer used.

ziemek99 commented 1 week ago

Rather than replacing the current OSI reference, I would suggest we add a new reference and keep the original so that tools the use the URL's to correlate license information won't break.

I suppose it only applies to files in src directory and not DOCS?

We'll end up with one live and one "not so live" URL.

Is there any required order to keep? For aesthetic reasons I'd like to keep "not so live" URLs under the live ones. If that'd break compatibility, though, I can keep the order of the previous entries and add current URLs underneath.

Another (better) solution would be fixing the parser logic so it follows any HTTP redirects and doesn't mark these URLs as no longer live.

goneall commented 1 week ago

Rather than replacing the current OSI reference, I would suggest we add a new reference and keep the original so that tools the use the URL's to correlate license information won't break.

I suppose it only applies to files in src directory and not DOCS?

Correct - only the src

We'll end up with one live and one "not so live" URL.

Is there any required order to keep? For aesthetic reasons I'd like to keep "not so live" URLs under the live ones. If that'd break compatibility, though, I can keep the order of the previous entries and add current URLs underneath.

I don't think order matters.

Another (better) solution would be fixing the parser logic so it follows any HTTP redirects and doesn't mark these URLs as no longer live.

Possibly - Although it is nice having the correct (non-redirected) URLs added for 2 reasons:

Flagging these on the website gives us a chance to add the new URL. There may be a better way to flag them, but this did result in an very welcome update to the src URLs. Again - thanks for the PR and noticing the change.

xsuchy commented 1 week ago

Otherwise LGTM.