pbiecek / archivist

A set of tools for datasets and plots archiving
http://pbiecek.github.io/archivist/
74 stars 9 forks source link

asearch for any local repository not just the default local repo #298

Closed reichek closed 6 years ago

reichek commented 8 years ago

Is it possible to enable asearch to search in any local repo specified as asearch(repo="PATH_TO_LOCAL_REPO") and not just in the default local repo?

MarcinKosinski commented 8 years ago

@reichek probably

sapply(character_vector_with_repository_dirs, function(one_repo){
  asearch(patterns, one_repo)
})

this will do this : )

reichek commented 8 years ago

Thanks for your prompt reply, but specifying the relative or absolute path of a local repo in asearch does not work for me.

>class(repoInputData)
[1] "character"

> showLocalRepo(repoDir=repoInputData)
                            md5hash                             name
1  8f784555be2d69d772a65734637a86ad                        list.data
2  18c1ed3744ca3a4e290621631400e90b 18c1ed3744ca3a4e290621631400e90b
3  8f784555be2d69d772a65734637a86ad                        list.data
4  ddfdb3eb402d0ebf3a631948a3310277 ddfdb3eb402d0ebf3a631948a3310277
5  ff575c261c949d073b2895b05d1097c3                        list.data
6  ddfdb3eb402d0ebf3a631948a3310277 ddfdb3eb402d0ebf3a631948a3310277

# Failure
>asearch(patterns = c("class:list"), repo=repoInputData)
Error: length(elements) >= 2 is not TRUE

# Failure
>asearch(patterns = c("class:list"), repo=normalizePath(repoInputData))
Error in downloadDB(remoteHook) : 
  Such a repo: [...] does not exist or there is no archivist-like Repository on this repo.

# Success
> setLocalRepo(repoDir=repoInputData)
> asearch(patterns = c("class:list"))
$`8f784555be2d69d772a65734637a86ad`
$`8f784555be2d69d772a65734637a86ad`$iris
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            5.1         3.5          1.4         0.2     setosa
2            4.9         3.0          1.4         0.2     setosa
3            4.7         3.2          1.3         0.2     setosa
4            4.6         3.1          1.5         0.2     setosa

# session info
> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] archivist_2.1 knitr_1.14    ggplot2_2.1.0 rmarkdown_1.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.7      rstudioapi_0.6   magrittr_1.5     devtools_1.12.0 
 [5] munsell_0.4.3    colorspace_1.2-6 R6_2.1.3         stringr_1.1.0   
 [9] httr_1.2.1       plyr_1.8.4       tools_3.3.0      grid_3.3.0      
[13] gtable_0.2.0     DBI_0.5-1        withr_1.0.2      htmltools_0.3.5 
[17] yaml_2.1.13      digest_0.6.10    formatR_1.4      bitops_1.0-6    
[21] RCurl_1.95-4.8   memoise_1.0.0    evaluate_0.9     RSQLite_1.0.0   
[25] labeling_0.3     stringi_1.1.1    scales_0.4.0     lubridate_1.6.0 
MarcinKosinski commented 8 years ago

Ok now I get this, so we need to set repo in sapply for every element

sapply(character_vector_with_repository_dirs, function(one_repo){
  setLocalRepo(repoDir=one_repo)
  asearch(patterns)
})

Is that better?

MarcinKosinski commented 8 years ago

Remember that at the end, the last element of character_vector_with_repository_dirs is now the default local repository.

reichek commented 8 years ago

Sorry, I see I need to be more precise. In the same Rmarkdown script I like to use a repo for the input data and another repo for all plots and results generated throughout the script. In order not to get confused between different settings of default local repositories, I would prefer to call archivist::asearch with a specified local repo directly (not resetting the default repo at each invocation of asearch). The same feature is enabled for repos on github. Here you enable searching in a specified github repo or in a default github repo.

MarcinKosinski commented 8 years ago

So using asearch with various local repositories can be only done with the usage of setLocalRepo function, as it's stated in the Note of ?asearch

Remember that if you want to use local repository you should set it to default.

There isn't any way for local repositories right now. But maybe you could share your results on a Remote repository? In this case you specify the user and the github-repository each time, so there is no space for confusion :)

2016-10-18 15:51 GMT+02:00 reichek notifications@github.com:

Sorry, I see I need to be more precise. In the same Rmarkdown script I like to use a repo for the input data and another repo for all plots and results generated throughout the script. In order not to get confused between different settings of default local repositories, I would prefer to call archivist::asearch with a specified local repo directly (not resetting the default repo at each invocation of asearch). The same feature is enabled for repos on github. Here you enable searching in a specified github repo or in a default github repo.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pbiecek/archivist/issues/298#issuecomment-254513815, or mute the thread https://github.com/notifications/unsubscribe-auth/AGdazutmp8phY851nc0nuV1C3LyqTNnPks5q1M79gaJpZM4KZuOW .

reichek commented 8 years ago

So, you do not plan to update 'asearch', 'asession', and 'aread' such that local repositories can be specified directly? I was wondering if you could provide this feature for these functions?

Thanks for your help in advance.

MarcinKosinski commented 8 years ago

@pbiecek do you think we can work on this?

2016-10-18 16:14 GMT+02:00 reichek notifications@github.com:

So, you do not plan to update asearch such that local repositories can be specified directly? I was wondering if you could provide this feature for asearch because for most of the archivist functions a direct definition of a local repo is enabled.

Thanks for your help in advance.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pbiecek/archivist/issues/298#issuecomment-254520538, or mute the thread https://github.com/notifications/unsubscribe-auth/AGdazhMX5kkZP1ihAksBJtX6ZebJQaXaks5q1NRRgaJpZM4KZuOW .

pbiecek commented 8 years ago

Pls correct me if I've missed some detail, But asearch is just a wrapper over searchLocalRepo just with a shorter name and limited options.

And in the searchLocalRepo function you can specify the repoDir for each call.

So maybe it's enough to use searchLocalRepo instead of asearch?

Dnia 18.10.2016 o godz. 16:58 Marcin Kosiński notifications@github.com napisał(a):

asearch

MarcinKosinski commented 8 years ago

asearch is a wrapper around searchInLocalRepo and loadFromLocalRepo (so it also loads objects) but this is not the point.

the point is that asearch is a standard used to provide hooks under results in reports (after archive or after addHooksToPrint

reichek commented 8 years ago

Thanks for your valuable comments! Since I am new to archivist I might not be aware of all features of your great package. The problem I have is that I would like to use a repo for input data (generated in script A) and a repo for results (generated in script B). This gives us the required freedom to ensure version control over input data, while different scripts use this input data to compute different analyses. Both data sets (input and output data) must be stored in local repositories. So, if all functions of archivist would enable the specification of the local repository which will be used, we won't have the problem to search, read or write into a wrong repo (just because it is set to the current default repo). Is this in general possible to implement or would it require substantial rewriting of archivist?

Thanks again for your help, Kristin

2016-10-19 12:21 GMT+02:00 Marcin Kosiński notifications@github.com:

asearch is a wrapper around searchInLocalRepo and loadFromLocalRepo (so it also loads objects) but this is not the point.

the point is that asearch is a standard used to provide hooks under results in reports (after archive or after addHooksToPrint

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pbiecek/archivist/issues/298#issuecomment-254773308, or mute the thread https://github.com/notifications/unsubscribe-auth/AEl6zi_occk6HCQdiWqmmIuxu0lx4E1-ks5q1e84gaJpZM4KZuOW .

MarcinKosinski commented 8 years ago

So I would suggest using searchInLocalRepo+loadFromLocalRepo for local repositories right now. asearch is only a wrapper and is not so powerfull like searchInLocalRepo+loadFromLocalRepo which can take repoDir as a parameter.

2016-10-19 15:27 GMT+02:00 reichek notifications@github.com:

Thanks for your valuable comments! Since I am new to archivist I might not be aware of all features of your great package. The problem I have is that I would like to use a repo for input data (generated in script A) and a repo for results (generated in script B). This gives us the required freedom to ensure version control over input data, while different scripts use this input data to compute different analyses. Both data sets (input and output data) must be stored in local repositories. So, if all functions of archivist would enable the specification of the local repository which will be used, we won't have the problem to search, read or write into a wrong repo (just because it is set to the current default repo). Is this in general possible to implement or would it require substantial rewriting of archivist?

Thanks again for your help, Kristin

2016-10-19 12:21 GMT+02:00 Marcin Kosiński notifications@github.com:

asearch is a wrapper around searchInLocalRepo and loadFromLocalRepo (so it also loads objects) but this is not the point.

the point is that asearch is a standard used to provide hooks under results in reports (after archive or after addHooksToPrint

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/pbiecek/archivist/issues/298#issuecomment-254773308 , or mute the thread https://github.com/notifications/unsubscribe-auth/AEl6zi_ occk6HCQdiWqmmIuxu0lx4E1-ks5q1e84gaJpZM4KZuOW .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pbiecek/archivist/issues/298#issuecomment-254811834, or mute the thread https://github.com/notifications/unsubscribe-auth/AGdazv46B52wzOraxAT6HF57hKYCOxUTks5q1hqtgaJpZM4KZuOW .

pbiecek commented 7 years ago

@MarcinKosinski Are you planning any implementation, or updates in the documentation related to this issue or we shall close it?

MarcinKosinski commented 7 years ago

This issue isn't resolved so Perhaps shouldn't be closed. I never intended to provide functionality that can somehow improve this approach. It's better to leave this open, as someone someday may provide a PR for this.

MarcinKosinski commented 6 years ago

So in the end this is not an issue, but a question with some propositions from our perspective. I'd close this as it's not an issue that can be fixed but a specified behavior and feature.

pbiecek commented 6 years ago

Actually, I think that the issue should be addressed somehow. The dirty solution is to create functions asearchLocal() and areadLocal() which will work as asearch() and aread() for local repos. This will basically just a wrapper over loadFromRemoteRepo and multiSearchInLocalRepo, but with nicer name and nicer defaults.

pbiecek commented 6 years ago

candidate fix in aa49767 new functions areadLocal and asearchLocal allow to specify the local directory directly