qsbase / qs

Quick serialization of R objects
405 stars 19 forks source link

Question about the back compatibility policy #41

Closed wlandau closed 4 years ago

wlandau commented 4 years ago

I am developing a package called targets, a Make-like pipeline tool for R. Like Make, targets tries to skip targets that are already up to date. Unlike Make, targets uses the hashes of data files to make decisions about which targets to run. Because of its incredible efficiency, qs is now the default data storage format. However, this comes with back compatibility risks, so I would like to touch base about your development plans. Will future versions of qread() be able to read files saved with earlier versions of qs (beginning with 0.23.2)? Will the hash of data serialized with qsave(preset = "high") change over time? The latter case is easier to cope with, especially due to https://github.com/wlandau/targets/issues/142#issuecomment-684821566.

traversc commented 4 years ago

Yes, then plan is for future versions to always be able to read earlier versions. But the hash of the same data written to file by qsave will not remain constant in future versions.

This is something I don't have full control over either, since I mainly rely on zstd as the compression algorithm. (zstd is constantly being updated and improved).

However, there's also an internal hash written at the end of the file (last 4 bytes, to check for data corruption) that should remain a bit more constant. Although it could still change, it probably won't happen very often.

wlandau commented 4 years ago

Thanks for clarifying. I think we can deal with differing hashes.

This is something I don't have full control over either, since I mainly rely on zstd as the compression algorithm. (zstd is constantly being updated and improved).

Does preset = "high" use ZSTD?

Also, is the ZSTD source fully included qs itself, or are there parts that qs only links to? If a user only installs CRAN releases of qs, is there any reason that the hashes of files would change faster than the CRAN release schedule?

traversc commented 4 years ago

Yes, that preset uses zstd. The zstd source is fully included, but the configure script checks for system installation and uses that for dynamic linkage if it exists. So the same CRAN version may have different file checksums depending on the version of zstd.

wlandau commented 4 years ago

the configure script checks for system installation and uses that for dynamic linkage if it exists.

Would you be willing to reconsider? If qs were to always use its own zstd, I think behavior would be more consistent and reproducible, and it would make it easier for folks like me to get our hands on a reasonably up-to-date zstd implementation. At my work, I share a large cluster with hundreds of other statisticians, it is much easier to update R packages (thanks to renv) than it is to update our system's toolchain. To give you an idea, my work's installation of R is compiled with gcc 4.8.5 (June 23, 2015).

traversc commented 4 years ago

You can force zstd/lz4 to compile during installation with configure args:

devtools::install_cran("qs", configure.args="--with-zstd-force-compile --with-lz4-force-compile", force=T)

However, the default needs to be the system library because of CRAN requirements from Prof. Ripley.

wlandau commented 4 years ago

That’s fair, thanks for explaining.

Did Prof. Ripley elaborate?