Open m1kit opened 3 years ago
I like the idea of adding a more structured file for expected duplicates.
Since the LicenseListPublisher is used by only a small number of organization, we don't need to worry too much about compatibility.
One question - should we: A) "expected duplicates" JSON file that this utility would check and not generate any warnings for duplicate licenses listed in the JSON file, or should we B) take a more general approach of changing the format of the expected warnings file to be a JSON file which would contain the expected duplicates but also contain other sections of expected warnings?
Approach A) may be more usable by other utilities whereas B) make more sense for the LicenseListPublisher.
I'm leaning to A) to make the file format more usable to other utilities.
@m1kit - what do you think?
Oh, I was also writing some similar ideas at the same time😂 Thanks anyway, @goneall !
I have three ideas in my mind.
One possible format is JSON like this
[{
"type": "duplicated-license",
"license-ids": [
"LGPL-2.1",
"LGPL-2.1-only"
],
"prefer": "LGPL-2.1-only"
},
{
// more expected warnings here
}]
This format is flexible to any future updates (new expected warning types). We may add some data for simplicity in the publisher like:
[{
"type": "duplicated-license",
"license-ids": [
"LGPL-2.1",
"LGPL-2.1-only"
],
"warnings": [
"Duplicates licenses: LGPL-2.1, LGPL-2.1-only",
"Duplicates licenses: LGPL-2.1-only, LGPL-2.1",
]
"prefer": "LGPL-2.1-only"
}]
It's like a hybrid of your Plan A and B.
Maybe it is not easy to parse JSON in Java.
We may store data in CSV format like... (but not flexible)
"message","from","to","prefer"
"Duplicates licenses: LGPL-2.1, LGPL-2.1-only","LGPL-2.1","LGPL-2.1-only","LGPL-2.1-only"
"Duplicates licenses: LGPL-2.1-only, LGPL-2.1","LGPL-2.1-only","LGPL-2.1","LGPL-2.1-only"
I think the data here is related to obsoletedBys
in license-list-XML.
I wonder to define similality of templates somehow in the XML.
Then we can pull data from XML and generate expected-warnings
dynamically in a format specific to LicenseListPublisher.
I forgot to mention my preference.
I think adding some info on XML is the best, if possible.
Or, we can make some generic expected duplicate in separate file somewhere in license-list-XML and dynamically generate a file for this library.
Or, we can make some generic expected duplicate in separate file somewhere in license-list-XML and dynamically generate a file for this library.
I like this idea as it would make the information more generally accessible and usable. We could replace the current expectewarnings file with an "KnownDuplicates.xml".
Although I tend to like JSON better than XML due to readability, the fact that the license-list-XML repo is primarily XML format would favor the XML format over JSON.
We can update this library to read the XML file and process it directly.
I'm tempted to just remove the expected warnings functionality since it is only currently used for known duplicates.
I would like the XML to deserialize into a Java object using one of the standard libraries without too much effort. Here's what I'm thinking might work (although I would want to test this out in code before finalizing):
<expectedDuplicates>
<duplicatedLicenseSet>
<licenseIds>
<licenseId>LGPL-2.1</licenseId>
<licenseId>LGPL-2.1-only</licenseId>
<licenseId>LGPL-2.1-or-later</licenseId>
</licenseIds>
<prefer>LGPL-2.1</licenseId>
<comment>The LGPL-2.1-only should be used if only the 2.1 version of the license is allowed, the LGPL-2.1-or-later should be used if any later version of 2.1 may be used. If unsure which applies, the LGPL-2.1 identifier should be used</comment>
</duplicatedLicenseSet>
</expectedDuplicates>
Hi, I agree with "KnownDuplicates.xml" idea.
I'd like to work on this - introduce the file on license-list-XML. I have a few questions about how-to.
.xsd
to define the schema?
.xsd
file? (I'm unfamiliar with it)I'd like to work on this - introduce the file on license-list-XML.
That would be great :)
I have a few additional suggestions on the file I've been thinking about - I'll add those as separate comments.
Do I have to write
.xsd
to define the schema?
A schema would be really nice to have for validating and even generating code.
If so, what is a recommended way to write a .xsd file?
There are a number of ways to create the XSD file. Since we need to change the Java application to use the XSD file, I have a suggested approach:
@XmlType
annotation and use JAXB to generate the schema - note that there are a lot of JAXB based tools, some built into IDE's like intellij or eclipseI would like to suggest we broaden the scope of the XML file to include other potential license issues which generate warnings in the LicenseRDFaGenerator. If we merge in PR #20 , there will be more expected warnings where the OSI approved flag doesn't match the OSI data.
I would like to name the file something different from "expectedwarnings" since I would like the file to be usable for a number of other purposes. Perhaps something like "KnownLicenseIssues.xml"?
I did some quick analysis of warning sources to see if we want to include any additional sections in the XML file for expected license issues.
The only one I think we should add is something to describe a list of license ID's where the OSI Approved flag doesn't match the OSI provided data (see PR #20 for context).
Below are other warnings which can be added as sections, but are not as likely to occur:
@m1kit - It's been a while for this issue - are you still interested in contributing? If not, I'll close the issue.
This file is useful not only during testing, but in the context of license matching.
Though the file owner is spdx/license-list-XML, I think the main user of the file is this repo. I'd like to leave some discussion here.
See the comments here for the details.