saveourtool / cosv4k

Kotlin and Java serialization schema for COSV
https://www.gitlink.org.cn/zone/CCF-ODC/source/7
MIT License
2 stars 0 forks source link

Patch details problems #8

Open orchestr7 opened 1 year ago

orchestr7 commented 1 year ago

1) patches_detail[] is now added as a separate field on the same level as affected[]:. It is incorrect, because we cannot make a mapping from an AFFECTED PROJECT (affected[]) to a patch where it was fixed and cannot properly show it.

Imagine, that your library XXX has two major versions 1.0.0 and 3.0.0 (both are supported). Vulnerability fixes usually go into both version, and in COSV we say:

"ranges": [ {
    "type": "SEMVER",
    "events": [
      { "introduced": "1.0.0" },
      { "fixed": "1.0.2" },
      { "introduced": "3.0.0" },
      { "fixed": "3.2.5" }
    ]
} ]

And patches were different with different commits IDs. In current schema we cannot do anything and create mapping between those project and fix.

Suggestion to move patches_detail to affected[].ranges[] or affected[]. PLease think about that.

2) affected[].package.language is DUPLICATED with patches_detail[].main_language

JustinB1eber commented 1 year ago

Thanks for your careful and considerate. Here's my opinions:

  1. About patches_detail[]. At the beginning of the design, we hope that all data is a direct extension of the vulnerability, and the patch and the affected package use the vulnerability as a bridge to connect. But the problem you said exists.
  1. About affected[].package.language , it is used to indicate the major language of package is developed in. This may be mainly used to give classification labels when displaying vulnerability reports, but for users who use this package as a dependency, it has no clear effect. They don't care what language the dependent package is developed in. patches_detail[].main_language is a more precise field, and it has more practical use. Most of the time, the repair of a patch involves code logic changes, common in code languages such as Java and C++, and sometimes only involves modification of configuration files, such as xml. For example, for those researchers who use fixing patch as dataset to train vulnerability-related deep learning models, this field would be important.
orchestr7 commented 1 year ago

@JustinB1eber but we are not able to match/map the patch to the project now.

Affected[] and patch_details[] are on the same level. If there are multiple records in each of these archives, we would not be able to make mapping.

How do we can do it?

orchestr7 commented 1 year ago

Proposal: to add patch_details to events

"events": [
      { "introduced": "0" },
      { "fixed": "1.0.2", "patch_details": "PATCH" },
      { "introduced": "3.0.0" },
      { "fixed": "3.1.2" },

For introduced you can skip it, for fix we will always have possibility to add these details and it will not break OSV schema, as this field will be optional. We will be still able to support OSV and COSV with the same deserialization mechanism

@JustinB1eber

JustinB1eber commented 1 year ago

@akuleshov7 We don't have to mapping patch_details[] to affected projects by move one into another, because there is a original mapping between patches_detail[].patch_url and affected[].ranges[].repo

JustinB1eber commented 1 year ago

{ "fixed": "1.0.2", "patch_details": "PATCH" } this might introduce a new problem is that when there is a patch fix in PR and the new version of hotfix is not being released we don't know where to put this patches_detail[]. For example in linux kernel, there are a lot of downstream OS projects forked from it, when those developers first time get the patch with commit ID, they cherry-pick it onto their branches and keep on developing, not to wait for the release.

orchestr7 commented 1 year ago

{ "fixed": "1.0.2", "patch_details": "PATCH" } this might introduce a new problem is that when there is a patch fix in PR and the new version of hotfix is not being released we don't know where to put this patches_detail[]. For example in linux kernel, there are a lot of downstream OS projects forked from it, when those developers first time get the patch with commit ID, they cherry-pick it onto their branches and keep on developing, not to wait for the release.

It's OPTIONAL. You don't need to put patch details to event.

orchestr7 commented 1 year ago

As discussed with @nulls events should also be extended by some time field (OPTIONAL): { "fixed": "1.0.2", "patch_details": "PATCH" , "time": "TIME" }

With this change we will cover cases with several fixes in different versions and packages. So we can remove several timeline events: found, fixed and introduced (calculatable from affected[]).