Hashing syntax nodes with children

utk-se / WorldSyntaxTree

Language-agnostic parsing of World of Code repositories

Other

20 stars 0 forks source link

Hashing syntax nodes with children #41

Open robobenklein opened 1 year ago

robobenklein commented 1 year ago

Take a function nodes, iterate it's children, create a list (string?) of node types in-order or preorder.

This essentially creates a "hash" of a class/function node that should have other matches across blobs with different hashes, allowing cross-project code reuse identification.

robobenklein commented 1 year ago

Notably: find hashes of functions with known vulns, which blob it belongs to, and find instances of revisions and/or projects still using the old / vulnerable function.