russHyde / dupree

{dupree} helps identify code blocks that have a high level of similarity in a set of R files
https://russhyde.github.io/dupree/
Other
37 stars 0 forks source link

Error in rep.int(NA_character_, max(ends - 1)) : invalid 'times' value #21

Closed IndrajeetPatil closed 5 years ago

IndrajeetPatil commented 5 years ago

I am trying to use this package to see if it works for this package (https://github.com/IndrajeetPatil/ggstatsplot), but I keep getting the following error-

# in package directory?
getwd()
#> [1] "C:/Users/inp099/Documents/ggstatsplot"

# checking for duplicated code
dupree::dupree_package(".")
#> Error in rep.int(NA_character_, max(ends - 1)) : invalid 'times' value
#> In addition: Warning message:
#> In max(ends - 1) : no non-missing arguments to max; returning -Inf

Here is the tracenack-

> traceback()
34: extract_r_source(source_file$filename, source_file$lines)
33: lintr::get_source_expressions(file)
32: get_source_expressions(.)
31: function_list[[i]](value)
30: freduce(value, `_function_list`)
29: `_fseq`(`_lhs`)
28: eval(quote(`_fseq`(`_lhs`)), env, env)
27: eval(quote(`_fseq`(`_lhs`)), env, env)
26: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
25: file %>% get_source_expressions() %>% get_localised_parsed_code_blocks() %>% 
        dplyr::filter_(~!token %in% "COMMENT")
24: .f(.x[[i]], ...)
23: purrr::map(., import_parsed_code_blocks_from_one_file)
22: function_list[[i]](value)
21: freduce(value, `_function_list`)
20: `_fseq`(`_lhs`)
19: eval(quote(`_fseq`(`_lhs`)), env, env)
18: eval(quote(`_fseq`(`_lhs`)), env, env)
17: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
16: files %>% purrr::map(import_parsed_code_blocks_from_one_file) %>% 
        dplyr::bind_rows()
15: import_parsed_code_blocks(.)
14: function_list[[i]](value)
13: freduce(value, `_function_list`)
12: `_fseq`(`_lhs`)
11: eval(quote(`_fseq`(`_lhs`)), env, env)
10: eval(quote(`_fseq`(`_lhs`)), env, env)
9: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
8: files %>% import_parsed_code_blocks() %>% tokenize_code_blocks() %>% 
       filter_(~block_size >= min_block_size)
7: preprocess_code_blocks(files, min_block_size)
6: eval(lhs, parent, parent)
5: eval(lhs, parent, parent)
4: preprocess_code_blocks(files, min_block_size) %>% find_best_matches()
3: dupree(keep_files, min_block_size)
2: dupree_dir(package, min_block_size, filter = paste0(package, 
       "/R/"))
1: dupree::dupree_package(".")

And session information-

sessioninfo::session_info()
#> - Session info ----------------------------------------------------------
#>  setting  value                                             
#>  version  R Under development (unstable) (2018-11-30 r75724)
#>  os       Windows 10 x64                                    
#>  system   x86_64, mingw32                                   
#>  ui       RTerm                                             
#>  language (EN)                                              
#>  collate  English_United States.1252                        
#>  ctype    English_United States.1252                        
#>  tz       America/New_York                                  
#>  date     2019-01-26                                        
#> 
#> - Packages --------------------------------------------------------------
#>  package     * version    date       lib source                    
#>  assertthat    0.2.0      2017-04-11 [1] CRAN (R 3.5.1)            
#>  cli           1.0.1.9000 2019-01-20 [1] Github (r-lib/cli@94e2fc5)
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 3.5.1)            
#>  digest        0.6.18     2018-10-10 [1] CRAN (R 3.5.1)            
#>  evaluate      0.12       2018-10-09 [1] CRAN (R 3.5.1)            
#>  highr         0.7        2018-06-09 [1] CRAN (R 3.5.1)            
#>  htmltools     0.3.6      2017-04-28 [1] CRAN (R 3.5.1)            
#>  knitr         1.21       2018-12-10 [1] CRAN (R 3.6.0)            
#>  magrittr      1.5        2014-11-22 [1] CRAN (R 3.5.1)            
#>  Rcpp          1.0.0      2018-11-07 [1] CRAN (R 3.6.0)            
#>  rmarkdown     1.11       2018-12-08 [1] CRAN (R 3.6.0)            
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.6.0)            
#>  stringi       1.2.4      2018-07-20 [1] CRAN (R 3.6.0)            
#>  stringr       1.3.1      2018-05-10 [1] CRAN (R 3.5.1)            
#>  withr         2.1.2      2018-03-15 [1] CRAN (R 3.5.1)            
#>  xfun          0.4        2018-10-23 [1] CRAN (R 3.6.0)            
#>  yaml          2.2.0      2018-07-25 [1] CRAN (R 3.5.1)            
#> 
#> [1] C:/Users/inp099/Documents/R/win-library/3.6
#> [2] C:/Program Files/R/R-devel/library

Created on 2019-01-26 by the reprex package (v0.2.1)

russHyde commented 5 years ago

Hi. This is occuring for precisely the same reason that you got the bug in lintr: here https://github.com/jimhester/lintr/issues/355 .

My package uses the code-parsing functions from `lintr which fails on files that have either i) no R code blocks (eg, an empty file, an all-comments file, or an all-text Rmarkdown file); or ii) code blocks from a non-R language (eg, python blocks in Rmarkdown)

So if your environment has the CRAN version of lintr, you will get this bug. The options are to either install the github version of lintr (since I fixed the lintr bug in a couple of pull requests) or to restrict analysis to files that are non-empty and have no non-R code blocks.

russHyde commented 5 years ago

Related to #4

IndrajeetPatil commented 5 years ago

Thanks, I downloaded the most recent version of lintr from GitHub just now. And now it works!

By the way, in a fresh R session, if dplyr is not explicitly loaded, this function gives a different error because it fails to find dplyr::n()-

> dupree::dupree_package(".")
Error in n() : could not find function "n"

Here is the traceback-

> traceback()
35: summarise_impl(.data, dots, environment())
34: summarise_.tbl_df(., enumerated_code = ~list(c(symbol_enum)), 
        block_size = "n()")
33: dplyr::summarise_(., enumerated_code = ~list(c(symbol_enum)), 
        block_size = "n()")
32: function_list[[k]](value)
31: withVisible(function_list[[k]](value))
30: freduce(value, `_function_list`)
29: `_fseq`(`_lhs`)
28: eval(quote(`_fseq`(`_lhs`)), env, env)
27: eval(quote(`_fseq`(`_lhs`)), env, env)
26: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
25: df %>% dplyr::group_by_(~file, ~block, ~start_line) %>% dplyr::summarise_(enumerated_code = ~list(c(symbol_enum)), 
        block_size = "n()")
24: summarise_enumerated_blocks(.)
23: function_list[[k]](value)
22: withVisible(function_list[[k]](value))
21: freduce(value, `_function_list`)
20: `_fseq`(`_lhs`)
19: eval(quote(`_fseq`(`_lhs`)), env, env)
18: eval(quote(`_fseq`(`_lhs`)), env, env)
17: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
16: block_df %>% remove_trivial_code_symbols() %>% enumerate_code_symbols() %>% 
        summarise_enumerated_blocks()
15: tokenize_code_blocks(.)
14: function_list[[i]](value)
13: freduce(value, `_function_list`)
12: `_fseq`(`_lhs`)
11: eval(quote(`_fseq`(`_lhs`)), env, env)
10: eval(quote(`_fseq`(`_lhs`)), env, env)
9: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
8: files %>% import_parsed_code_blocks() %>% tokenize_code_blocks() %>% 
       filter_(~block_size >= min_block_size)
7: preprocess_code_blocks(files, min_block_size)
6: eval(lhs, parent, parent)
5: eval(lhs, parent, parent)
4: preprocess_code_blocks(files, min_block_size) %>% find_best_matches()
3: dupree(keep_files, min_block_size)
2: dupree_dir(package, min_block_size, filter = paste0(package, 
       "/R/"))
1: dupree::dupree_package(".")
> 

I think this is undesirable behavior.

IndrajeetPatil commented 5 years ago

I think the latter issue should disappear once you import n from dplyr. It's currently used but not imported- https://github.com/russHyde/dupree/blob/b29ed71f69de48937cb4d88fd19921821e782c94/NAMESPACE#L6-L14

russHyde commented 5 years ago

Thanks, will get this fixed

russHyde commented 5 years ago

Just fixed the dplyr::n non-import. Thanks for reporting.

russHyde commented 5 years ago

fixed in #24