russHyde / dupree

{dupree} helps identify code blocks that have a high level of similarity in a set of R files
https://russhyde.github.io/dupree/
Other
37 stars 0 forks source link

Length of code-block contents should be returned #3

Closed russHyde closed 6 years ago

russHyde commented 6 years ago

dupree() filters out trivial symbols and then quantifies the similarity between the blocks that remain. But this means that: library(dplyr) gets converted to non-trivial symbols "library dplyr"

so any pair of files containing library(dplyr) will match exactly for this specific block. We either need a way of running dupree for blocks that are of at least this length or a way of returning results from dupree that contain the block-lengths for each compared pair of blocks.