microsoft / sarif-tools

A set of Python command line tools for working with SARIF files produced by code analysis tools
MIT License
91 stars 21 forks source link

Sarif 3.0: spurious and inconsistent diff reporting (due to name truncation?) #68

Closed igsilya closed 1 month ago

igsilya commented 1 month ago

Hi, in our OVS CI we're getting seemingly random diffs like this one since 3.0 update:

Check: exiting with return code 2 due to increase in issues at or above note severity
error level: +0 -0 no changes
warning level: +2 -2
  New issue "core.NullDereference ..." (36 occurrences)
    file:///home/runner/work/ovs/ovs/include/openvswitch/hmap.h:319
    file:///home/runner/work/ovs/ovs/include/openvswitch/list.h:262
    file:///home/runner/work/ovs/ovs/include/openvswitch/nsh.h:287
    ...
  New issue "core.UndefinedBinaryOperatorResult The left operand of ' ..." (4 occurrences)
    file:///home/runner/work/ovs/ovs/lib/dpif-netdev.c:946
    file:///home/runner/work/ovs/ovs/lib/dpif-netdev.c:963
    file:///home/runner/work/ovs/ovs/lib/netlink-socket.c:240
    ...
  Eliminated issue "core.NullDereference Access to field ' ..."
  Eliminated issue "core.UndefinedBinaryOperatorResult The  ..."
note level: +0 -0 no changes
all levels: +2 -2

It feels like the cause is the grouping of the similar issues and truncating the name differently in different cases, so the same issue is marked as new as well as eliminated. This did not happen with sarif-tools v2. The diffs seem to appear randomly, i.e. diff may fail or not fail while building exactly the same code in a different jobs in GitHub Actions.

debonte commented 1 month ago

The diffs seem to appear randomly, i.e. diff may fail or not fail while building exactly the same code

@igsilya, does re-running sarif diff on the same files generate the same diff output? If so, can you share a pair of files that reproduce this issue?

igsilya commented 1 month ago

@debonte I can't seem to get different results on the same file on the same machine, but I can make the diff fail or not to fail with different order of files in directories. Attached an archive with two directories with two files in each: 4files.zip

$ sarif --version
SARIF tools v3.0.2

$ ls -l ./a ./b
./a:
total 732
-rw-r--r--. 1 496142 Sep 28 16:56 10.sarif
-rw-r--r--. 1 243438 Sep 28 16:56 7.sarif
./b:
total 732
-rw-r--r--. 1 243438 Sep 28 11:53 10.sarif
-rw-r--r--. 1 496142 Sep 28 11:53 7.sarif

$ sarif --check note diff ./a ./b
error level: +0 -0 no changes
warning level: +1 -1
  New issue "core.NullDereference Access to field ' ..." (11 occurrences)
    file:///home/runner/work/ovs/ovs/lib/db-ctl-base.c:688
    file:///home/runner/work/ovs/ovs/lib/db-ctl-base.c:953
    file:///home/runner/work/ovs/ovs/lib/db-ctl-base.c:1364
    ...
  Eliminated issue "core.NullDereference ..."
note level: +0 -0 no changes
all levels: +1 -1
Check: exiting with return code 1 due to increase in issues at or above note severity

Now swap the files in 'b':

$ mv b/10.sarif b/70.sarif
$ mv b/7.sarif b/10.sarif
$ mv b/70.sarif b/7.sarif

$ ls -l ./a ./b
./a:
total 732
-rw-r--r--. 1 496142 Sep 28 16:56 10.sarif
-rw-r--r--. 1 243438 Sep 28 16:56 7.sarif
./b:
total 732
-rw-r--r--. 1 496142 Sep 28 11:53 10.sarif
-rw-r--r--. 1 243438 Sep 28 11:53 7.sarif

$ sarif --check note diff ./a ./b
error level: +0 -0 no changes
warning level: +0 -0 no changes
note level: +0 -0 no changes
all levels: +0 +0

Files in 'a' and 'b' are identical, only the order is different:

$ md5sum ./a/* ./b/*
7be23ca2027b743125a0a72265b022dc  ./a/10.sarif
02642cac975cbe8e3929ad53c5708eb1  ./a/7.sarif
02642cac975cbe8e3929ad53c5708eb1  ./b/10.sarif
7be23ca2027b743125a0a72265b022dc  ./b/7.sarif

v2.0.0 finds no difference regardless of the file order:

$ sarif --version
sarif-tools version 2.0.0
$ sarif --check note diff ./a ./b
error level: +0 -0 no changes
warning level: +0 -0 no changes
note level: +0 -0 no changes
all levels: +0 +0
$ mv b/10.sarif b/70.sarif
$ mv b/7.sarif b/10.sarif
$ mv b/70.sarif b/7.sarif
$ sarif --check note diff ./a ./b
error level: +0 -0 no changes
warning level: +0 -0 no changes
note level: +0 -0 no changes
all levels: +0 +0

It is important for us that the order of files doesn't matter, because these are generated by clang-analyzer and it creates a separate randomly named file per object, so the order is different between builds and we're trying to track new issues between builds this way.

debonte commented 1 month ago

@igsilya, thanks for the repro steps. I was able to reproduce the behavior you are seeing and will investigate.

debonte commented 1 month ago

@igsilya, I have a pending PR to fix this. If you have time, you could try out the wheel from the PR validation build and verify that it fixes your issue.

igsilya commented 1 month ago

@debonte thanks! It appears to be working. At least, on a few examples I have.

debonte commented 1 month ago

The fix has been released in v3.0.3 which is now availble on pypi.