space-ros / process_sarif

Tools to process and aggregate SARIF output from static analysis.
Apache License 2.0
6 stars 2 forks source link

Refactor duplicate_results and add tests #7

Closed Ronoman closed 2 years ago

Ronoman commented 2 years ago

A few changes:

sarif.py: Changes to decorators on Artifact and Region to make them hashable, hopefully making remove_duplicate_results faster.

sarif_helpers.py: Refactored find_duplicate_results->remove_duplicate_results, which intakes and outputs List[SarifFile] instead of List[Result].

test_sarif.py: New tests to support existing functionality, and demonstrate remove_duplicate_results doing what it should. Tests marked with @pytest.mark.skip() aren't implemented yet, but all tests without decorators pass.

resources: SARIF files to support test_sarif.py.

nuclearsandwich commented 2 years ago

I pushed a couple of changes which I made when trying to review this PR locally. @Ronoman feel free to review these yourself and remove or adjust them if you need to.


https://github.com/space-ros/process_sarif/pull/7/commits/a9532ae49f4ed865ac8bc4eb7e15c723855f5f11 is a reorganization of the SARIF content used for testing and associated helpers.

These tests were using hard-coded paths instead of the path function at the top of the file. That function was also reaching into the package installation directory which may not be present or accessible during a test run. Since the files are just for testing anyway, move them into a test fixtures directory and use the current test file to locate the fixtures directory relatively.

This also resolves the undeclared dependency on ament_index_python by removing it.


https://github.com/space-ros/process_sarif/pull/7/commits/778b3d2faf21d13c4774a6c09466f17dc30e0856 is much more discretionary and updates the duplicate finder to use a single set rather than a set of nested dicts.

There isn't really a need to create a directory of nested dicts if all we're concerned with is whether an exact result matching the ruleId, artifact, and region has previously been seen so let's use a set with tuple elements. It should be comparably performant to the nested hash approach while being much more succinct.

Ronoman commented 2 years ago

LGTM, good catch on the absolute path, and a nice use of sets to reduce that function down considerably.