In need of a canonical way to add license texts for licenseRefs / Maybe bugged?

jkrae-metaeffekt commented 1 year ago

Trying to implement a custom "exporter" that converts a detailed, custom SBOM format to SPDX, I encountered some really weird behaviour regarding the implementation of Chapter 10 "Other licensing information detected section", which I believe corresponds to the ExtractedLicenseInfo class in the library. Since the base SPDX license list is insufficient for our use, we use both licenses from the list and licenseRefs to maintain correctness and detail in all our license expressions.

Using the library, I noticed that after converting and adding artifacts/software components as SPDX packages (maybe also files at some point), the converter programmatically finds and adds license texts for each custom licenseRef. However, I couldn't find an example or documentation about how this is supposed to be done.

And that is the main question: What's the supported / recommended way to add custom license texts to a document with the java library?

Trying to figure this out myself, I encountered some counterintuitive behaviour that may be caused by bugs in the implementation:

In our use-case, we want to write the resulting SPDX document to JSON using the spdx-java-jackson-store with MultiFormatStore. I think it has to do with a combination of the store's behaviour, ModelSet#add and the use of ExtractedLicenseInfo#id to hold the licenseRef. I think that when I try to add my new ExtractedLicenseInfo later, with the correct licenseRef in the id field, the store will reject this new, completed object, instead keeping a previously added AnyLicenseInfo unchanged, which didn't have the text yet.

As a user, these unique ids and the order in which I add license expressions and corresponding license texts are an implementation detail that I do not want to worry about.

Again, since I don't know the correct way, i experimented, but any interface I could find was unsatisfactory. To demonstrate my issues with current behaviour, I attached a class containing examples on some issues I am trying to demonstrate. In the code, // PROBLEM: comments try to highlight what I feel the issue is.

Regrettably, the code ended up being messy. As a summary for each example:

Example 1

Simply using document.addExtractedLicenseInfos(new ExtractedLicenseInfo(licenseRef, text)) doesn't make the correct text show up in the JSON.

I think this is the way it should work, since info in this section would be somewhat independent of what licenseRefs have previously been added through other packages.

Example 2

Since Example 1 doesn't work, I tried to use an instance returned by parseSPDXLicenseString.

this did not work, as the issue wasn't with the construction of ExtractedLicenseInfo but with the way objects are resolved.
this makes for bad code, since we have to speculate on the real type of the given AnyLicenseInfo (and cast to ExtractedLicenseInfo to be able to call SpdxDocument#addExtractedLicenseInfos)

Through extensive browsing of source code, I later found out about LicenseInfoFactory.parseSPDXLicenseString's overloading, leading to Example 4.

Example 3

This demonstrates a hack that immediately adds a license text just after first parsing the expression that contains it.

demonstrates that an instance of ExtractedLicenseInfo is remembered in some obscure part of the store without explicitly adding it (unsure where from, probably somewhere under SpdxDocument#createPackage?)
- later the instance is found present already by matching its id, breaking hasExtractedLicensingInfos when manually adding ExtractedLicenseInfo
this solution is unintuitive and makes for hacky code
unsuitable to my use case for technical reasons
- I need to be able to add license texts in a later processing stage.

Example 4

This is similar to Example 2, abusing LicenseInfoFactory#parseSPDXLicenseString. It will return a writable instance to which I add the text inplace.

same disadvantages as Example 2, except it produces the desired result (text shows up in hasExtractedLicensingInfos)
bad api, this terrible hack leads to bad code and bad stability (since the interface is buried in implementation details).
it's the way I bodged it... because I found no other way to make my program work
demonstrates that addExtractedLicenseInfos doesn't add the given object to hasExtractedLicensingInfos at all! Instead, it later resolves something by "id" in some non-trivial way, depending on what licenseRefs are / were found in previously present license expressions.

I think a user of the library:

shouldn't have to call parseSPDXLicenseString with its store, etc as parameters like i did here
shouldn't ever have to rely on parseSPDXLicenseString giving the exact same instance between calls to achieve functionality
should instead be able to add and remove information as objects freely
- without them having unexpected side-effects (like an earlier package addition leading to broken license texts in hasExtractedLicensingInfos)

To conclude, I believe a change in the library's behaviour is required. Example 1 could be a first step towards a better implementation.

So: If needed, I'll try to debug further for a PR once I know the way things should work.

jkrae-metaeffekt commented 1 year ago

Seems Github refuses .java. I'll upload the example code as a txt, i guess.

SpdxExtractedLicenseInfoBehaviour.txt

goneall commented 1 year ago

Wow - @jkrae-metaeffekt sorry to hear about the challenges in what should be a straight-forward task.

Example 1 should work.

My guess is that the stores are getting confused.

There is a default store, but you can pass in store to override it. When using the parsing, it uses the default store which is probably causing some of the issues.

Calling new ExtractedLicenseInfo(licenseRef, text) will also use the default license store.

There is a lower level API for every SPDX Element / object that takes the store a parameters.

I would suggest trying the following:

        ExtractedLicenseInfo eli = new ExtractedLicenseInfo(document.getModelStore(), document.getDocumentUri(), licenseRef, document.getCopyManager(), true);
        eli.setExtractedText(text);
        document.addExtractedLicenseInfos(eli);

There are a number of "helper methods" in the ModelObject which is superclass of all SPDX elements to create new objects copying the same store / etc. Unfortunately, it looks like one doesn't exist for ExtractedLicenseInfo.

If one did exist, you could just call document.addExtractedLicenseInfos(document.createExtractedLicense(licenseRef, text))

Having the above mentioned method would be a consisted API to solve this problem.

Let me know if the above low level API works. If so, we should create a PR to add the additional helper method.

Thanks for you patience.

jkrae-metaeffekt commented 1 year ago

Thank you for your help!

Also: Yay! Your suggestion works correctly when dropped into my test class.

Please consider assigning this issue to me, I'll go work on a PR to add the (very much necessary, i think) helper method. I'll try to dress up the documentation (ExtractedLicenseInfo constructors, perhaps?), also. Hopefully it will prevent someone from running into the same issue.

goneall commented 1 year ago

Good to hear it works.

Thanks @jkrae-metaeffekt for taking this on. I'll assign it to you.

BTW - I'm going to be traveling for 2-3 weeks without my laptop, so it may take me a while to respond.

@pmonks - Feel free to review / merge and PR's while I'm out if your comfortable, otherwise I'll catch up once I'm back.

spdx / Spdx-Java-Library