Open ferdnyc opened 1 week ago
Hi @ferdnyc, thanks for your detailed thoughts here!
I agree, and this is something that has been sitting in the back of my mind for some time now. The specific variations encoded in the regular expressions for <alt>
tags are important, and I understand that some downstream projects (such as Fedora) are handling these.
But I suspect that most people aren't seeing the regexes from this repo, or from license-list-data, and are instead just viewing the website versions at https://spdx.org/licenses. And as you noted, nothing in that HTML view clearly indicates whether the red text for a given <alt>
tag (or <bullet>
, etc.) is "replace with anything" or "replace with these specific characters."
I haven't had a chance to dig into this, but I'm certainly open to us coming up with a cleaner solution. Here are a couple, feel free to share others:
(Option 3 is probably a good idea, regardless of whether we also do 1 or 2)
The code that generates the License List website is available at https://github.com/spdx/licenseListPublisher. Specifically https://github.com/spdx/LicenseListPublisher/tree/master/resources/htmlTemplate contains the corresponding HTML templates, if there are suggested edits you'd like to propose.
Uses of
<alt>
Currently, the
<alt>
tag is used in two fundamentally different capacities.As a field to mark customizable text
Many of the license texts include customizable details related to the project being licensed, like project name, copyright statements, maintainer or rightsholder addresses, etc.
These are frequently wrapped in an alt tag that will match anything (
match=".*"
), although a few have more specific matching patterns. A variable name is always specified (name="something"
) to capture the matched string.For example, in
BSD-2-Clause.xml
, the text specifying "THE COPYRIGHT HOLDER(S) (AND|OR) CONTRIBUTORS" are made customizable in two places, captured ascopyrightHolderAsIs
andcopyrightHolderLiability
:https://github.com/spdx/license-list-XML/blob/9269d7211fba83092697d5211c5f81988222ec84/src/BSD-2-Clause.xml#L30
https://github.com/spdx/license-list-XML/blob/9269d7211fba83092697d5211c5f81988222ec84/src/BSD-2-Clause.xml#L33
In
Python-2.0.1.xml
, the specific Python version for which the license applies is similarly captured in a number of places, using a more specific regular expression:https://github.com/spdx/license-list-XML/blob/9269d7211fba83092697d5211c5f81988222ec84/src/Python-2.0.1.xml#L35
https://github.com/spdx/license-list-XML/blob/9269d7211fba83092697d5211c5f81988222ec84/src/Python-2.0.1.xml#L84
To support minor variations in license texts
Other uses of alt tags aren't free-form/customizable at all, but merely prevent slight variations in license text from causing a failure to match the license. Going back to
BSD-2-Clause.xml
, the word "EXPRESS" is surrounded by an alt tag not because it's customizable, but simply because some versions of the text contain "EXPRESSED" instead of "EXPRESS":https://github.com/spdx/license-list-XML/blob/9269d7211fba83092697d5211c5f81988222ec84/src/BSD-2-Clause.xml#L31
The same is true in
xpp.xml
, where "University" may be misspelled as "Univeristy", and where a certain conjunction may be either "and" or "or" (but nothing else):https://github.com/spdx/license-list-XML/blob/9269d7211fba83092697d5211c5f81988222ec84/src/xpp.xml#L40
Other tags handled as replaceable
This also applies to e.g.
<copyrightText>
and<bullet>
, which are presented identically as red replaceable text, despite having different purposes.It makes sense for a project to customize the copyright text of its license as needed, so
<copyrightText>
can fairly be treated like the first category of<alt>
tags above.But the bullets used in the license are more akin to the second type of
<alt>
tag above, in that there are a fairly limited set of possibilities for what might be found in their place. The1.
before the first clause inBSD-2-Clause.xml
might be replaced with1)
, or1 —
, or even nothing if the list is numbered automatically, but it probably shouldn't be replaced with45)
orapple)
.Presentation of
<alt>
Software doesn't care about the purpose of a given
<alt>
tag, and for the purposes of matching the information that it's replaceable is sufficient. But to humans, the implications of the two types of "replaceable" text are unlikely to be the same. And because these two very different situations are handled the same way in the code/data, they're also presented the same way on the website. All replaceable text is presented in red, so the handling of variations appears to indicate that unexpected bits of text are customizable or free-form, when in fact they're not.In the display of
BSD-2-Clause.xml
, for example, it seems potentially confusing for "EXPRESS" to be shown in the same red text as "THE COPYRIGHT HOLDERS AND CONTRIBUTORS" — at least, without also providing some explanation for why and how someone would "customize" the word EXPRESS:Since optional and replaceable texts are indicated to humans by coloring the text blue or red, respectively, it's presumably of some value to highlight those locations. But if "there could be anything here" freely-customizable areas of the text, and other areas where only a very limited set of options will pass, are all presented the same, it seems as though that value could be somewhat reduced?