Open workingjubilee opened 8 months ago
The text in the JSON file actually come from a text file and not the XML.
For context, please refer to this pull request for the tool that generates the JSON and website from the XML and test data: https://github.com/spdx/LicenseListPublisher/pull/83
If the JSON data is incorrect, then the test data is incorrect.
BTW - there is a flag in the LicenseListPublisher tool to generate the JSON file from the XML instead of the test data. If we change the switch, it will reopen many issues raised in the above mentioned pull request.
Referencing the Wayback Machine archive for http://affero.org/oagpl.html on 2006-01-05 gives me this:
AFFERO GENERAL PUBLIC LICENSE
Version 1, March 2002
Copyright © 2002 Affero Inc.
510 Third Street - Suite 225, San Francisco, CA 94107, USA
From this HTML:
<td width="99%" valign="Top" align="Center">
<div align="Left">
<p><b><big><big>AFFERO GENERAL PUBLIC LICENSE</big></big></b><br>
</p>
<p><big>Version 1, March 2002</big><br>
<br>
Copyright © 2002 Affero Inc.<br>
510 Third Street - Suite 225, San Francisco, CA 94107,
USA</p>
So yes, it seems that in this case:
Obviously, no one is really using the AGPL 1.0 for new work right now, indeed as far as I am aware it was never very popular, and then the AGPL 3.0 happened only a few years later. But that was why I chose it as an initial test case: it's fairly easy to reference its canonical version, and I had, at the time, figured its lack of popularity meant there wouldn't be as much dispute over its exact contents, which is an issue that plagues e.g. MIT, the various BSD-N-clauses, etc.
THE VERY SHORT VERSION: Translating XML to JSON seems to result in significant differences between the JSON and rendered website text.
I printed the JSON text data from https://github.com/spdx/license-list-data/blob/main/json/details/AGPL-1.0.json using a Rust program after applying the transformation of the
\u2007
escaping sequence to a Rust-recognized\u{2007}
sequence. Later experiments with JS REPLs seem to yield an exactly matching text output. I acquired this: LICENSE.txt. Yet this is different from what the website renders, because the website's rendered version looks like:However, the JSON-tripped version is:
Note that both get the first line right and then start on the same second line but then disagree on the next three. The JSON data for `"licenseText" up to that point is the following:
The XML data looks like:
That is, it includes a pair of
<br/>
s here, one in each<p></p>
pair, which I believe is accounting for the rendered spacing on the website. This causes copying the version from the website to get a LICENSE-RIGHTCLICK.txt and running that through tools like askalono to return an inexact match, despite being, as far as I know, an exact copy!Note that the AGPL 1.0 has the clause:
"Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed."
I have excerpted this quote in a standard citational form but I have not added emphasis because, as the license says... changing it is not allowed. This suggests one of the two forms, the XML-encoded text, or the JSON string, is meaningfully incorrect, as they render to substantively different displayed text by typical renderers for their encoding.
I have no idea if this actually matters, of course. I am not a lawyer, this is not legal advice, etc. etc. etc. However, it seems that the generation of the JSON data from the XML masters may be dropping important formatting details, and it would not seem strange to me if a legal case, however frivolous-seeming, hinged on this difference, given how many cases have been decided on the presence or absence of commas.
This seems to have fellow issues in, but does not seem to be an exact duplicate of,
The reason why it does not seem to be an exact copy of #1924 is that it seems like all the data necessary to achieve a replication of the website's formatting is there in the XML, but not in the JSON, and that the checked-in test data seems to be derived from a JSONified-first form?
This could also be, say, an HTML vs. XML difference.