Closed wlandau closed 4 years ago
Yes, then plan is for future versions to always be able to read earlier versions. But the hash of the same data written to file by qsave
will not remain constant in future versions.
This is something I don't have full control over either, since I mainly rely on zstd
as the compression algorithm. (zstd
is constantly being updated and improved).
However, there's also an internal hash written at the end of the file (last 4 bytes, to check for data corruption) that should remain a bit more constant. Although it could still change, it probably won't happen very often.
Thanks for clarifying. I think we can deal with differing hashes.
This is something I don't have full control over either, since I mainly rely on zstd as the compression algorithm. (zstd is constantly being updated and improved).
Does preset = "high"
use ZSTD?
Also, is the ZSTD source fully included qs
itself, or are there parts that qs
only links to? If a user only installs CRAN releases of qs
, is there any reason that the hashes of files would change faster than the CRAN release schedule?
Yes, that preset uses zstd. The zstd source is fully included, but the configure script checks for system installation and uses that for dynamic linkage if it exists. So the same CRAN version may have different file checksums depending on the version of zstd.
the configure script checks for system installation and uses that for dynamic linkage if it exists.
Would you be willing to reconsider? If qs
were to always use its own zstd, I think behavior would be more consistent and reproducible, and it would make it easier for folks like me to get our hands on a reasonably up-to-date zstd implementation. At my work, I share a large cluster with hundreds of other statisticians, it is much easier to update R packages (thanks to renv
) than it is to update our system's toolchain. To give you an idea, my work's installation of R is compiled with gcc 4.8.5 (June 23, 2015).
You can force zstd/lz4 to compile during installation with configure args:
devtools::install_cran("qs", configure.args="--with-zstd-force-compile --with-lz4-force-compile", force=T)
However, the default needs to be the system library because of CRAN requirements from Prof. Ripley.
That’s fair, thanks for explaining.
Did Prof. Ripley elaborate?
I am developing a package called
targets
, a Make-like pipeline tool for R. Like Make,targets
tries to skip targets that are already up to date. Unlike Make,targets
uses the hashes of data files to make decisions about which targets to run. Because of its incredible efficiency,qs
is now the default data storage format. However, this comes with back compatibility risks, so I would like to touch base about your development plans. Will future versions ofqread()
be able to read files saved with earlier versions ofqs
(beginning with 0.23.2)? Will the hash of data serialized withqsave(preset = "high")
change over time? The latter case is easier to cope with, especially due to https://github.com/wlandau/targets/issues/142#issuecomment-684821566.