Closed florisvdh closed 5 years ago
I can reproduce this. It seems like git2r::hashfile()
yields a different output under Linux and Windows
filename <- tempfile("os-bug")
writeLines(
c("x\ty", "1\t1", "2\t2", "3\t3", "4\t4", "5\t5", "6\t6", "7\t7",
"8\t8", "9\t9", "10\t10", "11\t11", "12\t12", "13\t13", "14\t14",
"15\t15", "16\t16", "17\t17", "18\t18", "19\t19", "20\t20", "21\t21",
"22\t22", "23\t23", "24\t24", "25\t25", "26\t26"),
filename
)
git2r::hashfile(filename)
Output:
1de50dce6d5139f98a8e69d4d45d26ae7d32c64f
3e6fbe383532f4312bd0f5c9f30976f64d00e9cc
Session info on Windows
Session info on Linux
According to @stewid, the difference in hash is due to the difference in line endings on Linux and Windows (ropensci/git2r#397).
Below is a reprex using write.table()
on Linux.
library(git2r)
x <- seq(1:26)
y <- letters
df <- data.frame(x, y, stringsAsFactors = FALSE)
filename <- tempfile("os-bug")
# unix style line endings
write.table(
x = df, file = filename, append = FALSE, quote = FALSE,
sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE,
col.names = TRUE, fileEncoding = "UTF-8"
)
hashfile(filename) # "50aabdcd96bd742fdcc41edcc6b3efdf8e63f498"
# windows style line endings
write.table(
x = df, file = filename, append = FALSE, quote = FALSE,
sep = "\t", eol = "\r\n", na = "NA", dec = ".", row.names = FALSE,
col.names = TRUE, fileEncoding = "UTF-8"
)
hashfile(filename) # "1783ed10fa5035a3963abf4202f42fe6ca88f046"
@florisvdh and @w-jan can you check if PR #53 solves this issue? use remotes::install_github("ropensci/git2rdata@datahash")
Didn't check Windows yet, but in Linux I now get a different hash than before, is this expected?
library(git2rdata)
x <- seq(1:26)
y <- letters
df <- data.frame(x,y)
write_vc(df, "df_vc", sorting = c("x"), strict = FALSE)
# b2658819ed189ec4496b4b25c55404f7d0918b6a 3514e919bcca45b232268c650a04db36a18aa6b5
# "df_vc.tsv" #"df_vc.yml"
Yes. This is possible. The hashes are now calculated based on the content instead of the file.
I checked in Windows and the same datahash is produced. Good work! I think it's OK to close the issue.
See some further comments in PR #53 .
This issue uses the reprex from issue #47 .
While not getting those errors, my output - in Linux - is always as:
Which is a different data_hash (stored in the yml file) than the Windows-generated one.
Session Info
R version 3.6.1 (2019-07-05) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Linux Mint 18.1 Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.6.0 LAPACK: /usr/lib/lapack/liblapack.so.3.6.0 locale: [1] LC_CTYPE=nl_BE.UTF-8 LC_NUMERIC=C LC_TIME=nl_BE.UTF-8 [4] LC_COLLATE=nl_BE.UTF-8 LC_MONETARY=nl_BE.UTF-8 LC_MESSAGES=nl_BE.UTF-8 [7] LC_PAPER=nl_BE.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=nl_BE.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] git2rdata_0.1 loaded via a namespace (and not attached): [1] drat_0.1.5 compiler_3.6.1 assertthat_0.2.1 tools_3.6.1 yaml_2.2.0 [6] git2r_0.26.1 packrat_0.5.0 fortunes_1.5-4