microsoft / sarif-tools

A set of Python command line tools for working with SARIF files produced by code analysis tools
MIT License
76 stars 19 forks source link

Fails to handle results with zero locations #12

Closed davidmalcolm closed 9 months ago

davidmalcolm commented 1 year ago

Looking at the specification, in "3.27.12 locations property" I see

A result object SHOULD contain a property named locations whose value is an array of zero or more location objects (§3.28) each of which specifies a location where the result occurred.

Hence it appears to be valid to have a result with no locations.

Consider e.g.:

{
  "version": "2.1.0",
  "$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
  "runs": [
    {
      "tool": {
        "driver": {
          "name": "Foo"
        }
      },
      "results": [
        {
          "ruleId": "B6412",
          "message": {
            "text": "The command-line option '--foo' wasn't recognized."
          },
          "level": "note",
          "locations": []
        }
      ]
    }
  ]
}

i.e. a result with zero objects in its locations array.

Most of the sarif subcommands fail on the above with an error such as:

$ sarif summary /tmp/foo.sarif 
Traceback (most recent call last):
  File "/usr/local/bin/sarif", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/sarif/cmdline/main.py", line 40, in main
    exitcode = args.func(args)
  File "/usr/local/lib/python3.8/site-packages/sarif/cmdline/main.py", line 399, in _summary
    summary_op.generate_summary(input_files, output, multiple_file_output)
  File "/usr/local/lib/python3.8/site-packages/sarif/operations/summary_op.py", line 40, in generate_summary
    summary_lines = _generate_summary(input_files)
  File "/usr/local/lib/python3.8/site-packages/sarif/operations/summary_op.py", line 62, in _generate_summary
    result_count_by_severity = input_files.get_result_count_by_severity()
  File "/usr/local/lib/python3.8/site-packages/sarif/sarif_file.py", line 957, in get_result_count_by_severity
    result_counts_by_severity.append(input_file.get_result_count_by_severity())
  File "/usr/local/lib/python3.8/site-packages/sarif/sarif_file.py", line 741, in get_result_count_by_severity
    get_result_count_by_severity_per_run = [
  File "/usr/local/lib/python3.8/site-packages/sarif/sarif_file.py", line 742, in <listcomp>
    run.get_result_count_by_severity() for run in self.runs
  File "/usr/local/lib/python3.8/site-packages/sarif/sarif_file.py", line 572, in get_result_count_by_severity
    records = self.get_records()
  File "/usr/local/lib/python3.8/site-packages/sarif/sarif_file.py", line 510, in get_records
    self._cached_records = [self.result_to_record(result) for result in results]
  File "/usr/local/lib/python3.8/site-packages/sarif/sarif_file.py", line 510, in <listcomp>
    self._cached_records = [self.result_to_record(result) for result in results]
  File "/usr/local/lib/python3.8/site-packages/sarif/sarif_file.py", line 530, in result_to_record
    raise ValueError(f"No location in {error_id} output from {tool_name}")
ValueError: No location in B6412 output from Foo

A similar thing happens on the variant where:

"locations": [{}]

i.e. a single location with no properties. My reading of the schema is that this too is valid.

Seen in the wild on SARIF output from GCC 13, which emits sometimes diagnostics with no location (e.g. for cases such as unrecognized command line argument when invoking the tool, but also sometimes on real diagnostics when we've got a bug in our location-tracking).