pbiecek / archivist

A set of tools for datasets and plots archiving
http://pbiecek.github.io/archivist/
74 stars 9 forks source link

Getting different md5 hash keys for the same artifact under a unit-test settings (even though the archive is empty in each attempt). #334

Open harell opened 5 years ago

harell commented 5 years ago

When including an additional unit test to the package that checks for artifacts md5hash, the unit test fails.

Given an artifact, and a path pointing at where to create the archive

When I initialize the archive (deleting any former archive and creating a new one)
AND I use saveToLocalRepo to store in archive
AND I store the md5 hash key given by saveToLocalRepo

Then the md5 hash key should be identical to digest::digest(, "md5")

The following script tests the above.

It works well when running the code line-by-line, but fails when it's included as part of the test suit (under ~/tests/testthat).

test_that("showLocalRepo generates identical hash keys", {
    ###########
    ## Setup ##
    ###########
    repo_dir <- tempdir()
    mdl_1 <- lm(mpg ~ ., mtcars[,c(1,2)])

    #####################
    ## Helper Function ##
    #####################
    init_archive <- function(path){
        set.seed(1235)
        unlink(path, recursive = TRUE, force = TRUE)
        dir.create(path, showWarnings = FALSE, recursive = TRUE)
        archivist::createLocalRepo(path, force = TRUE, default = FALSE)
    }

    #####################################
    ## Create Object via showLocalRepo ##
    #####################################
    init_archive(repo_dir)
    md5hash_archivist_1 <- archivist::saveToLocalRepo(artifact = mdl_1,
                                                      repoDir = repo_dir,
                                                      value = FALSE,
                                                      force = TRUE)
    init_archive(repo_dir)
    md5hash_archivist_2 <- archivist::saveToLocalRepo(artifact = mdl_1,
                                                      repoDir = repo_dir,
                                                      value = FALSE,
                                                      force = TRUE)

    ###########
    ## Tests ##
    ###########
    expect_equal(md5hash_archivist_1, md5hash_archivist_2, check.attributes = FALSE, use.names = FALSE)
    expect_equal(md5hash_archivist_1, digest::digest(mdl_1, "md5"), check.attributes = FALSE, use.names = FALSE)
    expect_equal(md5hash_archivist_2, digest::digest(mdl_1, "md5"), check.attributes = FALSE, use.names = FALSE)
})

The errors show that the function results with different hash key values

image

Any idea what causes the unit test to fail?

pbiecek commented 5 years ago

thanks will look into this

harell commented 5 years ago

Just got a hint why it happens in this thread It's potentially solvable with wrapping the artifact before hashing with a function. See note here