spdx / tools

SPDX Tools
Apache License 2.0
126 stars 69 forks source link

tag:value reports with debian package versions do not always validate due to epoch: #203

Closed rnjudge closed 4 years ago

rnjudge commented 4 years ago

Bug description: The SPDX validation tool does not validate in tag:value format when there are debian packages present that make use of the epoch debian versioning scheme. This issue is related to how debian versions their packages with an epoch that uses a colon to separate the epoch from the upstream version/debian release. You can read more about it here but to summarize, their versioning looks like this: [epoch:]upstream_version[-debian_revision].

In most cases, the epoch defaults to 0 and is omitted but in the case where it is present the colon that follows confuses the SPDX tag:value validation.

To reproduce: To reproduce this error we ran Tern on a debian image and tried to validate the output. We have since replaced the epoch ':' separator with a '-' as a workaround for this issue so you cannot run Tern at the latest commit and reproduce. However, this commit in tern/formats/spdx/spdxtagvalue/generator.py lines 30-33 demonstrates the workaround. If you remove the change, you can re-create the error.

Error in SPDX Validator Tool: No external document ref found for SPDX ID SPDXRef-bsdutils.1:2.33.1-0.1. While verifying for RDF/XML format: [line: 1, col: 1 ] Content is not allowed in prolog

Expected Behavior: The tag:value input format should validate when there is an epoch included in the package version.

References: This issue with the validation was first discovered when trying out a debian-based image output from Tern. More here.

goneall commented 4 years ago

After reviewing the document, I believe this not actually a bug.

The SPDX ID reference, defined in section 8.4, has an optional document ID followed by a colon ":" then an SPDXID which is a unique string containing letters, numbers, “.”,“-”.

The colon can not be used within the SPDX ID as it is ambiguous with the external package reference.

Note - using a colon in the version designation for a package per section 3.3 is OK - you just can't use it in the SPDX ID.

@kestewart let me know if you disagree.

rnjudge commented 4 years ago

The SPDX ID reference, defined in section 8.4, has an optional document ID followed by a colon ":" then an SPDXID which is a unique string containing letters, numbers, “.”,“-”.

Note - using a colon in the version designation for a package per section 3.3 is OK - you just can't use it in the SPDX ID.

Thanks for looking at this so quickly, Gary! In Tern, we currently create a unique SPDX ID using the package {name}-{version}. Given your findings I think we may need to rethink how we do this if we can't consistently use the full version how it is represented from the distro (in the case of epochs).

goneall commented 4 years ago

I'll go ahead and close the issue.

@rnjudge BTW - I like the approach taken by Tern to create human readable and meaningful SPDX ID's. Perhaps you can just to a systematic replacement of any invalid characters with another character (e.g. underscore) when creating the ID's.

rnjudge commented 4 years ago

@goneall We currently replace the : after an epoch with a - and this validates just fine. Another thought we had was to create a hash from the package name/version as we can guarantee its uniqueness without having to alter the representation of the package version. This will obviously not be human readable, though. What are benefits you see to keeping the SPDXRef human readable?

goneall commented 4 years ago

What are benefits you see to keeping the SPDXRef human readable?

There are some use of the SPDX ID's to refer to a package (e.g. in relationships). If the SPDX ID was human readable, you wouldn't need to look up the definition of the SPDXRef. This is only a benefit for tag/value and Spreadsheet formats since the RDF and other proposed formats will likely not be read without some sort of tool.