nexB / scancode-analyzer

scancode-results-analyzer
4 stars 2 forks source link

Make Conditions to detect false positives more explicit #34

Closed AyanSinhaMahapatra closed 3 years ago

AyanSinhaMahapatra commented 3 years ago

Here in nexB/scancode-toolkit#2371, there's an instance of a false-positive where "rule_length": 2.

This doesn't get detected as a false-positive because currently the steps are:- To separate probable false-positives was, "is_license_tag" == true and "rule_length" == 1 as here, and then run it through a classifier to determine that more accurately.

We definitely need to -

  1. Set in place a more explicit step, by going through all the scancode license_tag rules, and see which ones have the potential to be matched to become a false_positive and then either increase these "rule_length" criteria for these cases to be correctly analyzed too or even maintain a set of rules which can generate potential false positives.
  2. Also add this case as a test.

From comment

AyanSinhaMahapatra commented 3 years ago

Another issue nexB/scancode-toolkit#2374 could also be picked up by the analyzer if #29 is implemented, to pick up false positives based on line number (say > 1000) and rule length (< 3 here, but have to find out a more suitable threshold). Also a good test case.