spdx / Spdx-Java-Library

Java library which implements the Java object model for SPDX and provides useful helper functions
Apache License 2.0
35 stars 33 forks source link

In need of a canonical way to add license texts for licenseRefs / Maybe bugged? #215

Closed jkrae-metaeffekt closed 10 months ago

jkrae-metaeffekt commented 11 months ago

Trying to implement a custom "exporter" that converts a detailed, custom SBOM format to SPDX, I encountered some really weird behaviour regarding the implementation of Chapter 10 "Other licensing information detected section", which I believe corresponds to the ExtractedLicenseInfo class in the library. Since the base SPDX license list is insufficient for our use, we use both licenses from the list and licenseRefs to maintain correctness and detail in all our license expressions.

Using the library, I noticed that after converting and adding artifacts/software components as SPDX packages (maybe also files at some point), the converter programmatically finds and adds license texts for each custom licenseRef. However, I couldn't find an example or documentation about how this is supposed to be done.

And that is the main question: What's the supported / recommended way to add custom license texts to a document with the java library?

Trying to figure this out myself, I encountered some counterintuitive behaviour that may be caused by bugs in the implementation:

In our use-case, we want to write the resulting SPDX document to JSON using the spdx-java-jackson-store with MultiFormatStore. I think it has to do with a combination of the store's behaviour, ModelSet#add and the use of ExtractedLicenseInfo#id to hold the licenseRef. I think that when I try to add my new ExtractedLicenseInfo later, with the correct licenseRef in the id field, the store will reject this new, completed object, instead keeping a previously added AnyLicenseInfo unchanged, which didn't have the text yet.

As a user, these unique ids and the order in which I add license expressions and corresponding license texts are an implementation detail that I do not want to worry about.


Again, since I don't know the correct way, i experimented, but any interface I could find was unsatisfactory. To demonstrate my issues with current behaviour, I attached a class containing examples on some issues I am trying to demonstrate. In the code, // PROBLEM: comments try to highlight what I feel the issue is.

Regrettably, the code ended up being messy. As a summary for each example:

Example 1

Simply using document.addExtractedLicenseInfos(new ExtractedLicenseInfo(licenseRef, text)) doesn't make the correct text show up in the JSON.

Example 2

Since Example 1 doesn't work, I tried to use an instance returned by parseSPDXLicenseString.

Through extensive browsing of source code, I later found out about LicenseInfoFactory.parseSPDXLicenseString's overloading, leading to Example 4.

Example 3

This demonstrates a hack that immediately adds a license text just after first parsing the expression that contains it.

Example 4

This is similar to Example 2, abusing LicenseInfoFactory#parseSPDXLicenseString. It will return a writable instance to which I add the text inplace.

I think a user of the library:


To conclude, I believe a change in the library's behaviour is required. Example 1 could be a first step towards a better implementation.

So: If needed, I'll try to debug further for a PR once I know the way things should work.

jkrae-metaeffekt commented 11 months ago

Seems Github refuses .java. I'll upload the example code as a txt, i guess.

SpdxExtractedLicenseInfoBehaviour.txt

goneall commented 11 months ago

Wow - @jkrae-metaeffekt sorry to hear about the challenges in what should be a straight-forward task.

Example 1 should work.

My guess is that the stores are getting confused.

There is a default store, but you can pass in store to override it. When using the parsing, it uses the default store which is probably causing some of the issues.

Calling new ExtractedLicenseInfo(licenseRef, text) will also use the default license store.

There is a lower level API for every SPDX Element / object that takes the store a parameters.

I would suggest trying the following:

        ExtractedLicenseInfo eli = new ExtractedLicenseInfo(document.getModelStore(), document.getDocumentUri(), licenseRef, document.getCopyManager(), true);
        eli.setExtractedText(text);
        document.addExtractedLicenseInfos(eli);

There are a number of "helper methods" in the ModelObject which is superclass of all SPDX elements to create new objects copying the same store / etc. Unfortunately, it looks like one doesn't exist for ExtractedLicenseInfo.

If one did exist, you could just call document.addExtractedLicenseInfos(document.createExtractedLicense(licenseRef, text))

Having the above mentioned method would be a consisted API to solve this problem.

Let me know if the above low level API works. If so, we should create a PR to add the additional helper method.

Thanks for you patience.

jkrae-metaeffekt commented 10 months ago

Thank you for your help!

Also: Yay! Your suggestion works correctly when dropped into my test class.

Please consider assigning this issue to me, I'll go work on a PR to add the (very much necessary, i think) helper method. I'll try to dress up the documentation (ExtractedLicenseInfo constructors, perhaps?), also. Hopefully it will prevent someone from running into the same issue.

goneall commented 10 months ago

Good to hear it works.

Thanks @jkrae-metaeffekt for taking this on. I'll assign it to you.

BTW - I'm going to be traveling for 2-3 weeks without my laptop, so it may take me a while to respond.

@pmonks - Feel free to review / merge and PR's while I'm out if your comfortable, otherwise I'll catch up once I'm back.