pbiecek / archivist

A set of tools for datasets and plots archiving
http://pbiecek.github.io/archivist/
74 stars 9 forks source link

archivedData md5hashes warnings #226

Closed MarcinKosinski closed 8 years ago

MarcinKosinski commented 8 years ago

In the past versions of archivist we were trying to inform user that the archiving proceed second time for the same artifact. But it looks like I might have been wrong while writing this statement

Look at the following example:

> exampleRepoDir <- tempfile()
> createLocalRepo(repoDir = exampleRepoDir)
Directory /tmp/Rtmp1JuNNQ/fileaed6edeef45 did not exist. Forced to create a new directory.
> data(iris)
> 
> wsave <- function(artifact, artifactName = deparse(substitute(artifact)), silent = TRUE,
+                                   rememberName = TRUE, archiveData = archiveData, session_info = TRUE) {
+   assign(x = artifactName, value = artifact)
+   saveToLocalRepo(artifact, repoDir=exampleRepoDir, 
+                                   artifactName =artifactName, silent = silent, rememberName = rememberName,
+                                   archiveData = archiveData, archiveSessionInfo = session_info)
+ }
> 
> wsave(iris, silent = FALSE, rememberName = FALSE, archiveData = FALSE)
[1] "ff575c261c949d073b2895b05d1097c3"
> showLocalRepo(method = "md5hashes", repoDir = exampleRepoDir)
                           md5hash                             name         createdDate
1 ff575c261c949d073b2895b05d1097c3 ff575c261c949d073b2895b05d1097c3 2016-02-11 00:07:59
2 e5afb291fef752f68056e9393e5935ac e5afb291fef752f68056e9393e5935ac 2016-02-11 00:07:59
> wsave(iris, silent = FALSE, rememberName = FALSE, archiveData = FALSE)
[1] "ff575c261c949d073b2895b05d1097c3"
Warning message:
In saveToLocalRepo(artifact, repoDir = exampleRepoDir, artifactName = artifactName,  :
  This artifact's data was already archived. Another archivisation executed with success.
> wsave(iris, silent = FALSE, rememberName = FALSE, archiveData = FALSE)
[1] "ff575c261c949d073b2895b05d1097c3"
Warning message:
In saveToLocalRepo(artifact, repoDir = exampleRepoDir, artifactName = artifactName,  :
  This artifact's data was already archived. Another archivisation executed with success.
> showLocalRepo(method = "md5hashes", repoDir = exampleRepoDir)
                           md5hash                             name         createdDate
1 ff575c261c949d073b2895b05d1097c3 ff575c261c949d073b2895b05d1097c3 2016-02-11 00:07:59
2 e5afb291fef752f68056e9393e5935ac e5afb291fef752f68056e9393e5935ac 2016-02-11 00:07:59
3 ff575c261c949d073b2895b05d1097c3 ff575c261c949d073b2895b05d1097c3 2016-02-11 00:07:59
4 e5afb291fef752f68056e9393e5935ac e5afb291fef752f68056e9393e5935ac 2016-02-11 00:07:59
5 ff575c261c949d073b2895b05d1097c3 ff575c261c949d073b2895b05d1097c3 2016-02-11 00:07:59
6 e5afb291fef752f68056e9393e5935ac e5afb291fef752f68056e9393e5935ac 2016-02-11 00:07:59
> showLocalRepo(method = "tags", repoDir = exampleRepoDir)
                           artifact                                           tag         createdDate
1  ff575c261c949d073b2895b05d1097c3                                    format:rda 2016-02-11 00:07:59
2  ff575c261c949d073b2895b05d1097c3                              class:data.frame 2016-02-11 00:07:59
3  ff575c261c949d073b2895b05d1097c3                          varname:Sepal.Length 2016-02-11 00:07:59
4  ff575c261c949d073b2895b05d1097c3                           varname:Sepal.Width 2016-02-11 00:07:59
5  ff575c261c949d073b2895b05d1097c3                          varname:Petal.Length 2016-02-11 00:07:59
6  ff575c261c949d073b2895b05d1097c3                           varname:Petal.Width 2016-02-11 00:07:59
7  ff575c261c949d073b2895b05d1097c3                               varname:Species 2016-02-11 00:07:59
8  ff575c261c949d073b2895b05d1097c3                      date:2016-02-11 00:07:59 2016-02-11 00:07:59
9  e5afb291fef752f68056e9393e5935ac                                    format:rda 2016-02-11 00:07:59
10 ff575c261c949d073b2895b05d1097c3 session_info:e5afb291fef752f68056e9393e5935ac 2016-02-11 00:07:59
11 ff575c261c949d073b2895b05d1097c3                                    format:txt 2016-02-11 00:07:59
12 ff575c261c949d073b2895b05d1097c3                                    format:rda 2016-02-11 00:07:59
13 ff575c261c949d073b2895b05d1097c3                              class:data.frame 2016-02-11 00:07:59
14 ff575c261c949d073b2895b05d1097c3                          varname:Sepal.Length 2016-02-11 00:07:59
15 ff575c261c949d073b2895b05d1097c3                           varname:Sepal.Width 2016-02-11 00:07:59
16 ff575c261c949d073b2895b05d1097c3                          varname:Petal.Length 2016-02-11 00:07:59
17 ff575c261c949d073b2895b05d1097c3                           varname:Petal.Width 2016-02-11 00:07:59
18 ff575c261c949d073b2895b05d1097c3                               varname:Species 2016-02-11 00:07:59
19 ff575c261c949d073b2895b05d1097c3                      date:2016-02-11 00:07:59 2016-02-11 00:07:59
20 e5afb291fef752f68056e9393e5935ac                                    format:rda 2016-02-11 00:07:59
21 ff575c261c949d073b2895b05d1097c3 session_info:e5afb291fef752f68056e9393e5935ac 2016-02-11 00:07:59
22 ff575c261c949d073b2895b05d1097c3                                    format:txt 2016-02-11 00:07:59
23 ff575c261c949d073b2895b05d1097c3                                    format:rda 2016-02-11 00:07:59
24 ff575c261c949d073b2895b05d1097c3                              class:data.frame 2016-02-11 00:07:59
25 ff575c261c949d073b2895b05d1097c3                          varname:Sepal.Length 2016-02-11 00:07:59
26 ff575c261c949d073b2895b05d1097c3                           varname:Sepal.Width 2016-02-11 00:07:59
27 ff575c261c949d073b2895b05d1097c3                          varname:Petal.Length 2016-02-11 00:07:59
28 ff575c261c949d073b2895b05d1097c3                           varname:Petal.Width 2016-02-11 00:07:59
29 ff575c261c949d073b2895b05d1097c3                               varname:Species 2016-02-11 00:07:59
30 ff575c261c949d073b2895b05d1097c3                      date:2016-02-11 00:07:59 2016-02-11 00:07:59
31 e5afb291fef752f68056e9393e5935ac                                    format:rda 2016-02-11 00:07:59
32 ff575c261c949d073b2895b05d1097c3 session_info:e5afb291fef752f68056e9393e5935ac 2016-02-11 00:07:59
33 ff575c261c949d073b2895b05d1097c3                                    format:txt 2016-02-11 00:07:59
> deleteLocalRepo(exampleRepoDir, TRUE)
> rm(exampleRepoDir)

I am not archiving data (for data.frame it is even impossible) but the error occures when I specify rememberName = FALSE

I think this check is invalid https://github.com/pbiecek/archivist/blob/master/R/saveToRepo.R#L193-L205

pbiecek commented 8 years ago

Is it like the warning occur because the session info is duplicated (is is being added every time)?

We may force silent=TRUE here: si <- devtools::session_info() md5hashDF <- saveToLocalRepo( si, archiveData = FALSE, repoDir = repoDir, rememberName = FALSE, archiveTags = FALSE, force=TRUE, archiveSessionInfo = FALSE)

MarcinKosinski commented 8 years ago

I got a little bit confused. rememberName should not be changed by a user. When we are using saveToRepo in a nested environment we give up it's name. rememberName controls (only inside the function) whether we have found an artifact's name in the parent envir and we'll archive with it''s name or we didn't find it's name in the upper/parent(1) envir and we gonna save it without a name (with a name that corresponds to the md5hash).

I have proposed a upgrade to this grotesque solution in #227 . Now we'll be able to archive artifacts with their names in a nested calls and we can remove rememberName parameter and we can remove many lines of unneeded poor code

MarcinKosinski commented 8 years ago

I thought it's the reason of archivingSessionInfo two. But the solution I proposed might fix the whole mess

MarcinKosinski commented 8 years ago

Probably the problem occured when we set archiveSessionInfo = TRUE by deafult and it was used inside extractData with default value, so what's why we encountered such an error.

I have removed rememberName parameter and the code of saveToRepo is a little bit more clear -> we'll be more robust to such ridiculous warnings. The changes can be merged here: https://github.com/pbiecek/archivist/pull/230

Then we can close this issue.