microsoft / sarif-tools

A set of Python command line tools for working with SARIF files produced by code analysis tools
MIT License
91 stars 21 forks source link

Summary fails for larger input #51

Closed Jiri-Stary closed 4 months ago

Jiri-Stary commented 4 months ago

I am running the summary on multiple sarif files and it seems that it does not correctly handle larger input

Run sarif summary ./*_scan.sarif -o ./hdf/issues.txt
Traceback (most recent call last):
  File "/home/runner/.local/bin/sarif", line 8, in <module>
    sys.exit(main())
  File "/home/runner/.local/lib/python3.10/site-packages/sarif/cmdline/main.py", line 61, in main
    exitcode = args.func(args)
  File "/home/runner/.local/lib/python3.10/site-packages/sarif/cmdline/main.py", line 401, in _summary_command
    summary_op.generate_summary(input_files, output, multiple_file_output)
  File "/home/runner/.local/lib/python3.10/site-packages/sarif/operations/summary_op.py", line 39, in generate_summary
    summary_lines = _generate_summary(input_files)
  File "/home/runner/.local/lib/python3.10/site-packages/sarif/operations/summary_op.py", line 64, in _generate_summary
    issue_type_histogram = report.get_issue_type_histogram_for_severity(severity)
  File "/home/runner/.local/lib/python3.10/site-packages/sarif/issues_report.py", line 138, in get_issue_type_histogram_for_severity
    self._group_records_by_key()
  File "/home/runner/.local/lib/python3.10/site-packages/sarif/issues_report.py", line 45, in _group_records_by_key
    key = combine_record_code_and_description(record)
  File "/home/runner/.local/lib/python3.10/site-packages/sarif/sarif_file_utils.py", line 52, in combine_record_code_and_description
    return combine_code_and_description(record["Code"], record["Description"])
  File "/home/runner/.local/lib/python3.10/site-packages/sarif/sarif_file_utils.py", line 33, in combine_code_and_description
    shorter_description = textwrap.shorten(
  File "/usr/lib/python3.10/textwrap.py", line 414, in shorten
    return w.fill(' '.join(text.strip().split()))
  File "/usr/lib/python3.10/textwrap.py", line 371, in fill
    return "\n".join(self.wrap(text))
  File "/usr/lib/python3.10/textwrap.py", line 362, in wrap
    return self._wrap_chunks(chunks)
  File "/usr/lib/python3.10/textwrap.py", line 263, in _wrap_chunks
    raise ValueError("placeholder too large for max width")
ValueError: placeholder too large for max width
Error: Process completed with exit code 1.
balgillo commented 4 months ago

Thanks for raising this; it looks like a bug in the new code for truncating long descriptions. Are you able to share the input file?

Jiri-Stary commented 4 months ago

@balgillo Unfortunately no, i used the error together with AI to debug this and looks like it may be an error .

ValueError: placeholder too large for max width, is due to the textwrap.shorten method being called with a width that is too small to accommodate the placeholder. The placeholder " ..." itself is 4 characters long, so if the width provided is less than 4, this error will be raised.

image

balgillo commented 4 months ago

Thanks, it's pretty clear that we need some defensive code to ensure that the word wrap is only attempted if there is enough space in the line length budget. I've added that in #52.