nexB / scancode-analyzer

scancode-results-analyzer
4 stars 2 forks source link

Some ideas for extra heuristics #29

Open pombredanne opened 3 years ago

pombredanne commented 3 years ago

from a chat with @maxhbr

AyanSinhaMahapatra commented 3 years ago

@pombredanne The second case is definitely better off being as a scancode-toolkit change IMHO.

The first one could be added as a results-analyzer heuristic to detect false-positives, as there could be short references after 100 lines more often? Will integrate this as soon as the plugin is ready.

These extra heuristics would be very easy to add once the structure is ready, and would be very important in the analysis process, we could also look at more statistics to get more of these!

I'm pushing a PR for the docs at #22 where the current classification/heuristics are detailed.

Also, in both cases, shouldn't they be license-tags in place of license-reference? What I saw from examples, false positives mostly get detected by single-word license-tag rules?

Thanks!

AyanSinhaMahapatra commented 3 years ago

This commit adds the first case of the extra heuristic, and partially solves this issue - https://github.com/nexB/scancode-results-analyzer/issues/34