guacsec / guac

GUAC aggregates software security metadata into a high fidelity graph database.
https://guac.sh
Apache License 2.0
1.29k stars 176 forks source link

[Software Identifier Problem] Implement GUAC mechanism/heuristics to convert between artifacts, packages and source repositories #217

Open lumjjb opened 2 years ago

lumjjb commented 2 years ago

This issue is to track the ability for GUAC to provide heurstics and mechanisms in order to ensure identical artifacts and packages are linked to each other, as well as similar packages/artifacts where appropriate.

This is a broader discussion of solving the software identifiers problem (different identifiers used by different people and non-consistency of identifier implementations). GUAC will initially provide heuristics to resolve this, and help inform implementers of identifiers consistency is lacking.

jchestershopify commented 1 year ago

Worth looking at is some fascinating work from IBM folks about identifying "code genomes". It looks to be useful for tying together related physical assets into the same logical asset (ie, foolib as compiled for x64 and ARM).

See: https://github.com/ossf/wg-securing-software-repos/issues/8

lumjjb commented 1 year ago

Wrote up a first draft at trying to solve the issue based on discussions with other maintainers https://docs.google.com/document/d/1BUEi7q2i-KXlAhsh1adYvL1fkWN-q8FrgLyEre7c5kg/edit?resourcekey=0-02sC5-9IbTfwJckze_CDQw#

pxp928 commented 1 year ago

Completed hashEqual. Identities still remain to be worked after-beta