ropensci / osfr

R interface to the Open Science Framework (OSF)
https://docs.ropensci.org/osfr
Other
142 stars 27 forks source link

osf_ls_files() duplicates filename and ignores file #150

Closed doomlab closed 1 year ago

doomlab commented 1 year ago

I am trying to download the filenames of a node. Every time we make a change to a file (upload, rename), a new file is "missing" from the download, and one is repeated.

OSF_sub <- data.frame(
  name = list.files(path="osfstorage-archive", 
                      pattern=NULL, 
                      all.files=FALSE,
                      full.names=TRUE,
                      recursive = TRUE))

ethics_page <- osf_retrieve_node("https://osf.io/ycn7z/")
local <- osf_ls_files(ethics_page, path = "Local IRB", n_max = 1000)
local$type_IRB <- "Local IRB"
rely <- osf_ls_files(ethics_page, path = "Rely on HU", n_max = 1000)
rely$type_IRB <- "Rely on HU"
none <- osf_ls_files(ethics_page, path = "No Ethics", n_max = 1000)
none$type_IRB <- "No Ethics"

OSF_sub <- rbind(local, rely, none)

When I run the code above I get the proper number of files BUT a file name is repeated:

image

And then if you look at the node, there's only one copy:

image

And then one is missing:

image image

Any idea what is going on? It often changes which one is missing, but it's always the newest that's repeated.

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.6

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices
[4] utils     datasets  methods  
[7] base     

other attached packages:
[1] googlesheets4_1.0.0
[2] osfr_0.2.8         
[3] tidyr_1.2.0        
[4] dplyr_1.0.9        
[5] rio_0.5.29         

loaded via a namespace (and not attached):
 [1] zip_2.2.0        
 [2] Rcpp_1.0.9       
 [3] pillar_1.8.0     
 [4] compiler_4.2.1   
 [5] cellranger_1.1.0 
 [6] forcats_0.5.1    
 [7] tools_4.2.1      
 [8] digest_0.6.29    
 [9] googledrive_2.0.0
[10] gargle_1.2.0     
[11] memoise_2.0.1    
[12] jsonlite_1.8.0   
[13] evaluate_0.16    
[14] lifecycle_1.0.1  
[15] tibble_3.1.8     
[16] pkgconfig_2.0.3  
[17] rlang_1.0.5      
[18] openxlsx_4.2.5   
[19] DBI_1.1.3        
[20] cli_3.3.0        
[21] rstudioapi_0.13  
[22] crul_1.2.0       
[23] curl_4.3.2       
[24] yaml_2.3.5       
[25] haven_2.5.0      
[26] xfun_0.32        
[27] fastmap_1.1.0    
[28] httr_1.4.3       
[29] knitr_1.40       
[30] fs_1.5.2         
[31] generics_0.1.3   
[32] vctrs_0.4.1      
[33] hms_1.1.1        
[34] triebeard_0.3.0  
[35] tidyselect_1.1.2 
[36] httpcode_0.3.0   
[37] glue_1.6.2       
[38] data.table_1.14.2
[39] R6_2.5.1         
[40] fansi_1.0.3      
[41] readxl_1.4.0     
[42] foreign_0.8-82   
[43] rmarkdown_2.16   
[44] purrr_0.3.4      
[45] magrittr_2.0.3   
[46] urltools_1.7.3   
[47] ellipsis_0.3.2   
[48] htmltools_0.5.3  
[49] assertthat_0.2.1 
[50] utf8_1.2.2       
[51] stringi_1.7.8    
[52] cachem_1.0.6 
aaronwolen commented 1 year ago

Thanks for the detailed issue, @doomlab! I was able to reproduce with a slightly smaller reprex:

library(osfr)
#> Automatically registered OSF personal access token

ethics_page <- osf_retrieve_node("https://osf.io/ycn7z/")
local <- osf_ls_files(ethics_page, path = "Local IRB", n_max = 1000)

dupe_ids <- local$id[duplicated(local$id)]
subset(local, id %in% dupe_ids)
#> # A tibble: 2 × 4
#>   name       id                       meta             type_IRB 
#> * <chr>      <chr>                    <list>           <chr>    
#> 1 421_Lu.pdf 6161fcd9fd5b2301429849b3 <named list [3]> Local IRB
#> 2 421_Lu.pdf 6161fcd9fd5b2301429849b3 <named list [3]> Local IRB

I'll take a look.

doomlab commented 1 year ago

Great thank you!

aaronwolen commented 1 year ago

I dug into this a little more and found the duplicate entries were coming directly from OSF so I'm going to close this in favor of the new issue I posted here: https://github.com/CenterForOpenScience/osf.io/issues/10086.