microsoft / sarif-tools

A set of Python command line tools for working with SARIF files produced by code analysis tools
MIT License
76 stars 19 forks source link

Add another original_histogram def #17

Closed wooch82 closed 8 months ago

wooch82 commented 1 year ago

because it's null after all pop operations and "-" diff calculates wrong way.

balgillo commented 10 months ago

Thank you for your PR and apologies for taking such a long time to look at it.

I don't follow the logic of your change here, though. The code is trying to get the count of new (+) and resolved (-) issues per severity. The old code pops every issue_code from original_histogram that is also in new_histogram to get the + count (issue codes that have increased from 0 occurrences to more than 0), then iterates through each issue_code that is left in original_histogram (but not in new_histogram) to count the resolved issues. Your change restores all the issue codes that are in both and counts them as resolved issues. That looks wrong to me... what am I missing here, can you explain further?

wooch82 commented 10 months ago

Agree, this is not the best solution. Anyway, string 99

for (issue_code, old_count) in original_histogram.items():

make no sense because original_histogram.items() is empty by the moment. The cause of it is string 94

old_count = original_histogram.pop(issue_code, 0)

that cut the data from the original_histogram And it lead to empty data with [-]

balgillo commented 10 months ago

It only pops the issue codes that are present in new_histogram, i.e. in the "after" list of issues. So line 99 goes through all issue codes that are present in original_histogram and absent in new_histogram, i.e. issues that were resolved between original and new. It's only empty if no issues were resolved.

wooch82 commented 10 months ago

Just compare the result without new original_histogram:

error level: +2 +0
  Number of occurrences 1 -> 2 (+1) for issue "BA2009 Error_NotDynamicBase"
  New issue "BA2001 Error" (1 occurence)
  New issue "BA2015 Error_NoHighEntropyVA" (1 occurence)
warning level: +0 +0
  Number of occurrences 2 -> 1 (-1) for issue "scanjs-rules/call_setTimeout The function setTimeout can be unsafe"
note level: +0 -0 no changes
all levels: +2 +0

and with it:

error level: +2 -6
  Number of occurrences 1 -> 2 (+1) for issue "BA2009 Error_NotDynamicBase"
  New issue "BA2001 Error" (1 occurence)
  New issue "BA2015 Error_NoHighEntropyVA" (1 occurence)
  Number of occurrences 4 -> 0 (-4) for issue "CSCAN-GENERAL0020 A potential secret was detected. Validate file contains secrets, remove, rotate credential, and use approved store. For additional information on secret remediation see the remediation section at https://aka.ms/CredScanDocs "
  Number of occurrences 1 -> 0 (-1) for issue "BA2009 Error_NotDynamicBase"
  Number of occurrences 1 -> 0 (-1) for issue "BA2018 Error"
  Number of occurrences 1 -> 0 (-1) for issue "CSCAN-AWS0010 A potential secret was detected. Validate file contains secrets, remove, rotate credential, and use approved store. For additional information on secret remediation see the remediation section at https://aka.ms/CredScanDocs "
  Number of occurrences 1 -> 0 (-1) for issue "@typescript-eslint/no-var-requires Require statement not part of import statement."
  Number of occurrences 1 -> 0 (-1) for issue "@typescript-eslint/no-empty-function Unexpected empty method 'actionGetUser'."
warning level: +0 -1
  Number of occurrences 2 -> 1 (-1) for issue "scanjs-rules/call_setTimeout The function setTimeout can be unsafe"
  Number of occurrences 2 -> 0 (-2) for issue "scanjs-rules/call_setTimeout The function setTimeout can be unsafe"
note level: +0 -0 no changes
all levels: +2 -7
balgillo commented 10 months ago

But that second output doesn't make sense, for example, these two lines contradict each other:

  Number of occurrences 1 -> 2 (+1) for issue "BA2009 Error_NotDynamicBase"
  Number of occurrences 1 -> 0 (-1) for issue "BA2009 Error_NotDynamicBase"

This tells me that the "before" SARIF file has 1 occurrence of BA2009, the "after" SARIF file has 2 occurrences of BA2009. The number of occurrences has increased, not gone down to zero, so the diff should show an increase not elimination. If you want to see all the issues, use summary on either the before or after file, not diff on both.

balgillo commented 8 months ago

Closing as invalid