spdx / spdx-3-model

The model for the information captured in SPDX version 3 standard.
https://spdx.dev/use/specifications/
Other
69 stars 44 forks source link

Annotation conversion between 2.x and 3.0 #477

Open manifestori opened 1 year ago

manifestori commented 1 year ago

As for v2.3 schema: https://github.com/spdx/spdx-spec/blob/development/v2.3.1/schemas/spdx-schema.json

The annotation model lacks "name" and "description" fields, but has "Annotator" with "Person: Jane Doe" or similar format. Is it safe to say that "Annotator" could be converted to "name"?

if so, why in https://github.com/spdx/spdx-3-model/blob/main/serialization/json_ld/examples/converted_from_spdx_2.json, annotations that were converted from spdx2 don't have a name?

Moreover, I'm trying to understand the purpose of name/description for Annotation, why is it a "complete object? can those fields be optional?

Think about a tool converting between formats, if CDX has "Properties" (similar to element-level annotation) that have key=value, should one use statement to have key=value as value? or should one use name for key, and statement as value.

I figure that there isn't "Property"-like structure in SPDX so Annotation is the closest and most appropriate candidate.

Going back to CDX from SPDX annotation would be affected by the same decision.

People have already tried approaching this in the past, but I rather have your opinion. https://github.com/spdx/cdx2spdx/blob/08ec34f11b15dd747410d13c1fc4d11645a20a4b/src/main/java/org/spdx/cdx2spdx/CycloneSpdxConverter.java#L565C60-L565C60

davaya commented 1 year ago

Annotators in v2.3 must be used to synthesize person/organization actors or tool elements in v3. This requires some guesswork since the annotator strings sometimes have email addresses in parentheses and sometimes not - you could assume there is only one Jane Doe in the universe, or treat each one with distinct emails (jd@gmail.com, Jane@outlook.com) as different Person elements.

goneall commented 1 year ago

@manifestori Thanks for opening this issue.

The annotation model lacks "name" and "description" fields, but has "Annotator" with "Person: Jane Doe" or similar format. Is it safe to say that "Annotator" could be converted to "name"?

The annotator should be converted to the creator of the Annotation in the creationInfo field.

I just updated the SPDX migration analysis document with a more complete recommendation for Annotation.

Feel free to add comments and suggestions to the migration guide.

goneall commented 1 year ago

Think about a tool converting between formats, if CDX has "Properties" (similar to element-level annotation) that have key=value, should one use statement to have key=value as value? or should one use name for key, and statement as value.

We have an opportunity with SPDX 3.0 to make this translation cleaner.

Should we add an optional properties map to the Annotation class? This would make the translation very clean. We could even add an Annotation type to indicate it was translated from CDX.

If there is interest in this approach, I can create a pull request later this week for consideration.

Another option is to use the extension property of Element for this purpose. However, it lacks structure and the fact it is a property of the Element means you can not add an extension after the Element is initially created.

manifestori commented 1 year ago

@davaya @goneall thank you for the clarifications, I can now be sure of how you intend to provide support for annotation.

So if a SPDX2.3 Annotation looks like this:

  {
    "annotationDate" : "2010-01-29T18:30:22Z",
    "annotationType" : "OTHER",
    "annotator" : "Person: Jane Doe ()",
    "comment" : "My annotation"
  }

It would translate into

{
      "type": "Annotation",
      "spdxId": "spdx-example:SPDXRef-Annotation",
      "annotationType": "other",
      "subject": "spdx-example:SPDXRef-MyElement",
      "statement": "My annotation,
      "creationInfo": {
          ...,
          "created": "2010-01-29T18:30:22Z",
          "createdBy": [
             "spdx-example:SPDXRef-Actor-JaneDoe"
          ],
          "createdUsing": [...]
          "profile": [...],
          "dataLicense":  "https://spdx.org/licenses/CC0-1.0",
          "comment": "..."
        },
    }

Adding a properties map to annotation, that's makes a lot of sense as many tools want to have a simple structured way of adding metadata to components/packages/libs/files. Abusing statement is not a great solution so a map would be most welcome and straightforward.

Regarding a "translated from cdx" annotation type, I don't think it serves a real purpose as a conversion tool would probably mention the fact it was converted from CDX by Tool X. or use the statement for it? Maybe something more general such as metadata instead of other would make more sense?

  "components": [
    {
      "bom-ref": "5a6ee31f-af7c-43bf-8ecc-09bc7fa997ad",
      "type": "application",
      "name": "examples/tests/requirements.txt",
      "properties": [
        {
          "name": "aquasecurity:trivy:Class",
          "value": "lang-pkgs"
        },
        {
          "name": "aquasecurity:trivy:Type",
          "value": "pip"
        }
      ]
    },

Looking at this snippet from CDX 1.5 components list, to translate it to CDX3.0 you would create an Element and an Annotation element associated with this element. The annotation

{
      "type": "Annotation",
      "spdxId": "spdx-example:SPDXRef-Annotation",
      "annotationType": "other",
      "subject": "spdx-example:SPDXRef-MyElement",
      "statement": "", # what do we do with the value of statement?
      "properties": {
         "aquasecurity:trivy:Class": "lang-pkgs",
          "aquasecurity:trivy:Type": "pip"
      }
      ...
    }

ofc you can make properties match the [{ name: string, value: string }] schema but this would consume more space.

Thanks again for your answers and open discussion <3

manifestori commented 1 year ago

@goneall I would also be thrilled to contribute to SPDX3.0 myself :) If you can, please point me toward the way to make an RFC for the annotation spec that would be great.

if submitting a PR updating the models and docs is sufficient, I would be happy to do this ASAP :)

goneall commented 1 year ago

@manifestori - We'd certainly welcome your contributions :) The only thing required is to submit a PR to the spec. The CONTRIBUTING.md file gives a few more details.

We also have a mailing list and weekly meetings to discuss issues which are not easily resolvable through pull requests.

BTW - I'll be out hiking the next 4 days, so I may be a bit slow responding to issues, but there are several others in the community who can review / provide feedback.

goneall commented 1 year ago

@manifestori - back from hiking. I like the idea of the properties. The example you shared above looks good to me.

One question: What type do we want to use for the property name and value? If both are simple strings, we can re-use the existing DictionaryEntry class. If we want to restrict the property to be a URI type, then we would need to create a new class.

My preference would be to use strings to make the map more flexible.

Let me know if you'd like me to create a PR - I would like to get this into the 3.0 release.

manifestori commented 1 year ago

Simple strings would do. DictionaryEntry sounds like the way to go <3

I will try to submit a pull request in the next few days. If I am unable to do so, I will get in touch with you. Thanks!

goneall commented 6 months ago

Since this is a non-breaking change and we don't have a PR in time for 3.0, I'm moving this to the 3.1 milestone