nexB / federatedcode

1 stars 1 forks source link

Design on disk storage structure for packages and vulnerabilties data #3

Open pombredanne opened 6 months ago

pombredanne commented 6 months ago

See the attached zip for a design discussed with @ziadhany and @TG1999 federatedcode-data-structure.zip The approach would be to have separate trees/repos for package metadata and vulnerabilities metadata, and have a cross reference from packages to vulns in packages and the other way in vulnerabilities.

The file tree would be looking more or less this way:

./aboutcode-vulnerabilities-1223
./aboutcode-vulnerabilities-1223/3434
./aboutcode-vulnerabilities-1223/3434/VCID-1223-3434-34343
./aboutcode-vulnerabilities-1223/3434/VCID-1223-3434-34343/advisories
./aboutcode-vulnerabilities-1223/3434/VCID-1223-3434-34343/VCID-1223-3434-34343.yml
./aboutcode-packages-ed5
./aboutcode-packages-ed5/maven
./aboutcode-packages-ed5/maven/org.apache.log4j
./aboutcode-packages-ed5/maven/org.apache.log4j/log4j-core
./aboutcode-packages-ed5/maven/org.apache.log4j/log4j-core/versions
./aboutcode-packages-ed5/maven/org.apache.log4j/log4j-core/versions/1.2.4
./aboutcode-packages-ed5/maven/org.apache.log4j/log4j-core/versions/vulnerabilities.yml
./aboutcode-packages-ed5/maven/org.apache.log4j/log4j-core/versions/1.2.3
./aboutcode-packages-ed5/maven/org.apache.log4j/log4j-core/versions/1.2.3/ossf-scorecard
./aboutcode-packages-ed5/maven/org.apache.log4j/log4j-core/versions/1.2.3/ossf-scorecard/scorecard.json
./aboutcode-packages-ed5/maven/org.apache.log4j/log4j-core/versions/1.2.3/spdx
./aboutcode-packages-ed5/maven/org.apache.log4j/log4j-core/versions/1.2.3/cyclonedx
./aboutcode-packages-ed5/maven/org.apache.log4j/log4j-core/versions/1.2.3/scancode-toolkit
./aboutcode-packages-ed5/maven/org.apache.log4j/log4j-core/versions/1.2.3/scancode-toolkit/scancode-toolkit-scan.json
./aboutcode-packages-ed5/maven/org.apache.log4j/log4j-core/versions/1.2.3/clearlydefined-curation
./aboutcode-packages-ed5/maven/org.apache.log4j/log4j-core/versions/1.2.3/vulnerabilities.yml
./aboutcode-packages-ed5/maven/org.apache.log4j/log4j-core/versions/1.2.3/osselot
./aboutcode-packages-ed5/maven/org.apache.log4j/log4j-core/versions/1.2.3/osselot/osselot-spdx.json
ziadhany commented 6 months ago

@pombredanne @TG1999 what ed5 stand for ? ./aboutcode-packages-ed5/maven/org.apache.log4j

pombredanne commented 1 month ago

@ziadhany re:

what ed5 stand for ? ./aboutcode-packages-ed5/maven/org.apache.log4j

sorry for the late reply, and we discussed it since: this is a hash

ziadhany commented 1 month ago

@pombredanne yes, we discussed this , and I have updated the pull request #1206 to match the new file structure:

./aboutcode-vulnerabilities-12/34/VCID-1223-3434-34343/VCID-1223-3434-34343.yml.

However, I'm still concerned about the performance of this script. I believe there's a more efficient method to detect updates on VulnerableCode. At the very least, we should aim to minimize the number of queries in this script.