ossf / osv-schema

Open Source Vulnerability schema.
https://ossf.github.io/osv-schema/
Apache License 2.0
186 stars 84 forks source link

fix(redhat_conversion): avoid repeated package entries #301

Closed andrewpollock closed 1 month ago

andrewpollock commented 1 month ago

For records like RHSA-2024:8116, .affected[] was ending up with the packages multiple times

andrewpollock commented 1 month ago

@jasinner

jasinner commented 1 month ago

RHSA-2024:8116.json

I verified this fixes the duplication with a script.

Here's the output with the current production version:


$ wget https://security.access.redhat.com/data/osv/RHSA-2024:8116.json

$ python3 
>>> import json
>>> with open("RHSA-2024:8116.json", "r") as fp:
...     osv_data = json.load(fp)
... 
>>> uniq_affected = set()
>>> affected_count = 0
>>> for affected in osv_data["affected"]:
...     affected_count += 1
...     package = affected["package"]
...     affected_lib = package["name"] + package["ecosystem"]
...     if affected_lib in uniq_affected:
...         print(f"found duplicated library: {affected_lib}")
...     else:
...         uniq_affected.add(affected_lib)
... 
found duplicated library: java-1.8.0-openjdkRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-accessibilityRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-debuginfoRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-demoRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-develRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-headlessRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-javadocRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-javadoc-zipRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-srcRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdkRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-accessibilityRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-debuginfoRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-demoRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-develRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-headlessRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-javadocRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-javadoc-zipRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-srcRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdkRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-accessibilityRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-debuginfoRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-demoRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-develRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-headlessRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-javadocRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-javadoc-zipRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-srcRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdkRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-accessibilityRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-debuginfoRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-demoRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-develRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-headlessRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-javadocRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-javadoc-zipRed Hat:rhel_els:7
found duplicated library: java-1.8.0-openjdk-srcRed Hat:rhel_els:7

With the affected version generated with these changes there are no duplicates:

$ python3 
Python 3.12.6 (main, Sep  9 2024, 00:00:00) [GCC 13.3.1 20240522 (Red Hat 13.3.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> with open("RHSA-2024:8116.json", "r") as fp:
...     osv_data = json.load(fp)
... 
>>> uniq_affected = set()
>>> affected_count = 0
>>> for affected in osv_data["affected"]:
...     affected_count += 1
...     package = affected["package"]
...     affected_lib = package["name"] + package["ecosystem"]
...     if affected_lib in uniq_affected:
...         print(f"found duplicated library: {affected_lib}")
...     else:
...         uniq_affected.add(affected_lib)
... 
>>> affected_count
9
>>> len(uniq_affected)
9