jeroen / openssl

OpenSSL bindings for R
Other
63 stars 19 forks source link

Recommend file(..., raw = TRUE) for checksums #85

Open nielsaka opened 3 years ago

nielsaka commented 3 years ago

When creating file checksums via sha1 or similar, I would recommend setting raw=TRUE in the file connections. Maybe that can be added to the documentation?

Use case: comparing files on different machines. If the file is an RDS file (or binary or compressed?) and raw=FALSE (default), the file() function does something that leads to changes in the hash. It is also quicker to use raw=TRUE.

Example:

> system("sha1sum data/article_all.Rds")
8192a2610e8e67e559ba80760f198bf810096f7a  data/article_all.Rds
> openssl::sha1(file("data/article_all.Rds"))
sha1 9c:11:ac:17:5a:86:7c:67:a4:77:ad:87:35:67:62:09:64:1e:88:36 
> openssl::sha1(file("data/article_all.Rds", raw = TRUE))
sha1 81:92:a2:61:0e:8e:67:e5:59:ba:80:76:0f:19:8b:f8:10:09:6f:7a 

  From the documentation of file

raw logical. If true, a ‘raw’ interface is used which will be more suitable for arguments which are not regular files, e.g.character devices. This suppresses the check for a compressed file when opening for text-mode reading, and asserts that the ‘file’ may not be seekable.