russHyde / dupree

{dupree} helps identify code blocks that have a high level of similarity in a set of R files
https://russhyde.github.io/dupree/
Other
36 stars 0 forks source link

[WIP] add script(s) to analyse lots of development packages using dupree #54

Closed russHyde closed 4 years ago

russHyde commented 4 years ago

Plan:

russHyde commented 4 years ago

TODO: - save the CRAN details as xml or yaml, there are newlines in the text that mean that when we output it as a tab-separated table, it can't subsequently be read in.

russHyde commented 4 years ago

Chose to strip out all whitespace, tab, carriage-returns characters, replacing all clusters of whitespace characters with a single space. The CRAN data-frame can now be saved to .tsv and reloaded without error.

russHyde commented 4 years ago

User can drop specific packages from analysis by modifying config[["drop"]]. I had to do this, since the repo for {logging} is of a format that crashed dupree.

russHyde commented 4 years ago

Timings and code-block analysis for min_block_size in {40, 100} were made. {bench} was used for the timing with >= 5x repeats for each run.

russHyde commented 4 years ago

This PR has been closed since the analysis contained here has been moved to a new repo "code_as_data". The analysis was too big to be kept as a part of dupree.