spdx / spdx-java-jackson-store

JSON storage implementation for the SPDX tools
Apache License 2.0
3 stars 7 forks source link

Multiple DESCRIBES relationships are not validated consistently #44

Open armintaenzertng opened 1 year ago

armintaenzertng commented 1 year ago

Not sure whether this is a tools-java or spdx-java-library issue. I generate a document using the following method:

public static void buildDocument() throws InvalidSPDXAnalysisException, IOException {

    var modelStore = new MultiFormatStore(new InMemSpdxStore(), MultiFormatStore.Format.XML, MultiFormatStore.Verbose.COMPACT);
    var documentUri = "https://some.namespace";
    var copyManager = new ModelCopyManager();

    var document = SpdxModelFactory.createSpdxDocument(modelStore, documentUri, copyManager);
    document.setName("document name");

    var sha1Checksum = Checksum.create(modelStore, documentUri, ChecksumAlgorithm.SHA1, "d6a770ba38583ed4bb4525bd96e50461655d2758");

    var fileA = document.createSpdxFile("SPDXRef-fileA", "./fileA.c", null,
                    List.of(), null, sha1Checksum)
            .build();

    document.getDocumentDescribes().add(fileA);

    document.addRelationship(
            document.createRelationship(
                    fileA, RelationshipType.DESCRIBES, null
            )
    );

    assert document.verify().isEmpty();

    modelStore.serialize(documentUri, new FileOutputStream("temp.xml"));
}

Note the assert statement that indicates that the generated document is valid. The above yields the following output in temp.xml:

<?xml version='1.0' encoding='UTF-8'?>
<Document>
  <SPDXID>SPDXRef-DOCUMENT</SPDXID>
  <spdxVersion>SPDX-2.3</spdxVersion>
  <creationInfo>
    <created>2022-10-13T12:37:44Z</created>
    <creators>Tool: SPDX Tools</creators>
    <licenseListVersion>3.18</licenseListVersion>
  </creationInfo>
  <name>document name</name>
  <dataLicense>CC0-1.0</dataLicense>
  <documentDescribes>SPDXRef-fileA</documentDescribes>
  <documentNamespace>https://some.namespace</documentNamespace>
  <files>
    <SPDXID>SPDXRef-fileA</SPDXID>
    <checksums>
      <algorithm>SHA1</algorithm>
      <checksumValue>d6a770ba38583ed4bb4525bd96e50461655d2758</checksumValue>
    </checksums>
    <fileName>./fileA.c</fileName>
  </files>
  <relationships>
    <spdxElementId>SPDXRef-DOCUMENT</spdxElementId>
    <relationshipType>DESCRIBES</relationshipType>
    <relatedSpdxElement>SPDXRef-fileA</relatedSpdxElement>
  </relationships>
</Document>

But now, when I call

java -jar tools-java-1.1.1-jar-with-dependencies.jar Verify temp.xml

I get the following error:

Analysis exception processing SPDX file: Relationships are expected to be in an array for type Relationship

Thus, the tools-java and spdx-java-library Verify methods seem to contradict each other.

This also raises the question of the value of a DESCRIBES relationship when the tag documentDescribes already exists.

goneall commented 1 year ago

@armintaenzertng I was able to duplicates this with a unit test. From trying out various scenarios, it looks like this only occurs in the XML form and only if you add a duplicate document describes through a new relationship rather than adding it to the describes collection.

I'll transfer this to the spdx-java-jackson-store which implements the XML format.

I suspect the issue is related to a de-duplication algorithm to prevent duplicate relationships from being serialized.

Since this is a very unlikely scenario for normal use, I'm not planning on doing much more work on this issue - but feel free to look into it in the code if you feel this is an important issue to resolve. Pull requests are welcome. I'll create a draft PR with the unit test.

goneall commented 1 year ago

This also raises the question of the value of a DESCRIBES relationship when the tag documentDescribes already exists.

There was a discussion on this in the SPDX spec repo and it was decided that the tag/value, JSON, and YAML formats will use the describes tag even though the the relationship already exists. The actual SPDX Model uses relationships for the hasFiles and documentDescribes property implementation. These tags are just added as a convenience for the serialization and desearlization in these formats.