spdx / LicenseListPublisher

Tool that generates license data found in the license-list-data repository from the license-list-XML source
Apache License 2.0
11 stars 18 forks source link

Copy license XML files when publishing licenses #164

Closed goneall closed 1 year ago

goneall commented 1 year ago

The current spec Annex B references the license XML files being available in the license-list-data repo, however, they are not being published.

We could simply copy the files over while publishing.

goneall commented 1 year ago

@swinslow @jlovejoy - do you recall if we made a formal decision to support the license XML files? I was searching for the original issue/PR but could not find it. I just want to make sure I'm consistent with the legal team decisions before making this change.

goneall commented 1 year ago

Based on comment in https://github.com/spdx/spdx-spec/issues/853, this should be implemented.

swinslow commented 1 year ago

I'm definitely +1 in supporting the license XML files via the publisher.

My only immediate hesitancy in giving an unqualified "yes" is that it's just recently that I've started to dig into the schema itself to understand how it works internally, beyond just learning from editing the XML files in the license-list-xml repo.

I am interested in seeing if there might be ways to simplify the schema, specifically by seeing if some of the standard license header, etc. fields might be easier to handle if they were extracted from being inline for the main <text> elements. But before making any changes to that, of course, I'd want to do a deeper dive into the existing set of licenses to see how many would be impacted, and whether it's worth the churn. And I don't foresee having that bandwidth at least for a couple of months.

So all that said, @goneall, I'd support us formally supporting the schema as it is now, with the one question about whether there's any potential for it to be versioned? E.g. to mark this as v1 of the schema, and to consider whether future revisions could similarly be versioned?

goneall commented 1 year ago

with the one question about whether there's any potential for it to be versioned?

Excellent idea - I'm a big fan of versioning anything that may be used as a standard data format.

I would like to track the schema version in the license XML itself so if we make any changes, we can validate it against the correct version.

Rather than creating a new field and having creators of the license XML fill it in, we could have a mapping between the license list version (which is already in the license XML files) and the schema version. We can store the version mapping in a markdown file in the version directory in the license-list-XML repo.