Open polarathene opened 2 months ago
@polarathene , thanks for detailed, considered request.
I'll definitely look into your suggestions to improve the publishing/distribution of qsv. I agree that tar.gz
is a more common format in *nix environments, however, there are several extenuating circumstances that need to be considered:
self_update
crate for its self-update feature. The naming convention of the zip archives was primarily dictated by self_update
's naming requirements, so this is indeed a blocker per your update.qsvlite --update
only extracts the qsvlite
variant) and not the whole archive. With tar.gz
, wouldn't it necessarily need to untar and decompress ALL the variants, before getting the desired variant?self_update
crate for zip, as it decompresses the binary by name - allowing me to pack all the variants in one big archive per platform.self-update
automatically removes the zip archive after it runs.zipsign
the archives for authenticity, and self_update
supports zipsigned archives.Just the same, as self_update
and zipsign
do support the tar.gz
format, I'll investigate your request in more detail and look into your other recommendations as well.
FYI, we primarily target the latest Ubuntu LTS on the x86_64 platform, which is glibc-based, as that is our standard deployment platform for our CKAN PaaS service. I don't get to exercise the musl build as much, so we depend on community feedback to improve it.
Thanks for sharing those insights, very informative π
- qsv uses the
self_update
crate for its self-update feature. The naming convention of the zip archives was primarily dictated byself_update
's naming requirements, so this is indeed a blocker per your update.
You could probably workaround that with an alternative host for releases? I know some other projects like Caddy have open-source plans with Cloudsmith (linked to related Github Action) for publishing releases either as packages or raw files/archives.
EDIT: Ah, seems like self_update
has a limited set of "backends", so they'd need to add Cloudsmith I guess if you were to consider that.
The intent was Cloudsmith would be the place your versioned archives would be stored for self-update functionality, and GH releases would be more akin to other projects GH releases π
- With
tar.gz
, wouldn't it necessarily need to untar and decompress ALL the variants, before getting the desired variant?
I've not inspected to check if it's decompressing the entire archive, but I really don't see that being a pragmatic concern for most?
With .tar.gz
I can still extract just the file of interest as the example with tar
command above shows. The benefit is that I don't need to write the archive to disk and remove it after, I can just pipe it as the memory required is minimal that writing a temporary copy to disk seems redundant.
- this "selective decompression" may very well have been an "incidental" feature of the
self_update
crate for zip, as it decompresses the binary by name - allowing me to pack all the variants in one big archive per platform.
I think that's more convenient on your end than the users though? Most only need a single variant, but instead need to download the whole archive redundantly to get that.
No bandwidth fees with Github on your end, so I can understand why that's fine. For users it also minimizes searching through the list of links to find one they're interested in, although with automation or self-update that's often a 1 time only benefit.
self-update
automatically removes the zip archive after it runs.
I assume only when it's pulling an update? Not the original archive? I didn't check that, but since I renamed the archive to make it simpler to extract qsv
via CLI it wouldn't know what to remove anyway π€·ββοΈ
- qsv's release tempo is quite high. That's why self-update is essential, as it really simplifies the process of getting the latest version once you go through the initial installation of the prebuilt binaries.
I understand the value of self updating for some users, but it's not something I want myself, it rarely is on linux (we have package managers for this purpose). Last thing I would want is to run some project that had some qsv
commands with a binary that self-updated itself with a breaking change I wasn't aware of (assuming this project is 6-12 months or so old), and then the functionality that should have worked is broken requiring time to be diverted (like a forced windows update).
I am aware of some distros having packages (often by community) that may pull from a pre-built release from official channels rather than doing a local build on the client as opposed to more official package repos per distro where they're built from source. Likewise in my case with Docker, adding qsv
into a projects image from GH release is nicer than the 10GB memory + time to build qsv
in CI. These are scenarios where self update functionality is not expected.
- we
zipsign
the archives for authenticity, andself_update
supports zipsigned archives.
You can accomplish the same with other formats, but I understand the choice with zip and self-updating features. My intention isn't to burden you with further maintenance, just to raise awareness of where some minor friction is for a different type of user.
EDIT: FWIW self_update
has zipsign support (and archive support) for tar.gz
.
cargo binstall
does have it's own support for verifying signed assets. That said it should work well once a dependency resolves the zip archive compatibility issue (caused by multiple files in the archive I think?), which AFAIK has been done just not released for over 6 months, but this would have been a non-issue with tar.gz
.
FYI, we primarily target the latest Ubuntu LTS on the x86_64 platform, which is glibc-based, as that is our standard deployment platform for our CKAN PaaS service. I don't get to exercise the musl build as much, so we depend on community feedback to improve it.
Static musl builds work on glibc systems (you usually can't have proper static glibc). For qsv
though there may be a difference in performance, it'd need to be benched.
My comment that you responded to though was about leveraging Zig to compile both glibc and musl builds from the same build host with the added benefit of not requiring any musl
related deps:
Your glibc min version will be from the build host:
Presently ubuntu-latest
resolves to Ubuntu 22.04, once that switches over to 24.04, the glibc min version support will be implicitly raised and anyone on older glibc will likely open issues about errors running qsv
.
With Zig (via cargo zigbuild
) you can specify min version of glibc you want to support instead where you would not be affected by such. Typically what most projects do before Zig is to build on older distro releases, but that is sometimes with drawbacks from the older software (some projects would use distro releases from 5 years or more to get broader compatibility with glibc).
So there's potential with zig to simplify your workflows / CI.
Thanks @polarathene for your detailed feedback.
I'll take your recommendations under further consideration as we refine the publishing workflow.
And thanks for pointing out that I should explicitly use ubuntu-22.04
instead of ubuntu-latest
.
As for qsv, my number one goal for the project is to be the fastest csv data-wrangling kit - thus the aggressive MSRV policy, taking advantage of the latest language features, the latest dependencies, etc.
One big ticket item that I haven't taken on with a big performance payoff is profile guided optimization (https://github.com/jqnatividad/qsv/issues/1448). Given the number of binaries/platforms we support, I can only do that for select platforms (starting with qsv pro).
I'll leave it to package maintainers should they choose to distribute qsv to fine-tune it to their requirements.
As for self-update, it's gated behind the self-update
feature, so you can easily build qsv without it in your Docker images. As a further safety precaution, the actual self update only works with the prebuilt
binaries. If you compile from source even with the self-update
feature enabled, it will only alert you to new releases and will not actually apply self-updates.
And even if you choose to use the pre-builts in your Docker image, you can set the QSV_NO_UPDATE
environment variable so it won't even check GH for new releases.
Finally, self-update is not automatic. You have to explicitly opt-in to update.
Is your feature request related to a problem?
On linux the
.zip
archive format is not as desirable for publishing releases (GH release assets)..zip
:vs
.tar.gz
:Additional justification:
tar
and related codec support there (gzip
/xz
) than it is for zip support. This is a minor issue, but not too uncommon with container environments for example (and perhaps some CI?).unzip
is often the package/command used to handle the zip format on linux, but it does not support a stream from stdin, the zip file must be downloaded to disk first, then extracted, followed by optionally removing the redundant archive.funzip
which can handle stdin, but this is not compatible for zip archives with multiple files like the QSV releases have.Describe the solution you'd like
Publishing assets for linux with
.tar.gz
as most projects do on GH releases for this platform.Also consider:
latest
release URL to be used. More details here.lto = "thin"
, as the published assets are built with features dropped due to CI memory limits (lto = true
requiring over 10GB memory to build from source): https://github.com/jqnatividad/qsv/issues/1102#issuecomment-2354491076Describe alternatives you've considered
No major issues, for manual downloads I can grab the latest URL from the release process with a few more clicks, then add a package in environments that need it. It would be nicer to not need to think about it though :)
I had also tried to leverage Cargo Binstall but got an error due to the
.zip
format not being as well supported.cargo binstall qsv
would automate pulling the latest release from your GH releases which works just as well for me if that functioned properly (note that you can also further improve your support for that utility by adding some metadata to yourCargo.toml
).At one point I did try building from source (this was problematic due to build requirements being ridiculous to try a CLI program out, over 10GB of memory was used).
Additional context
At this point changing the asset name (either by extension or complimentary version omission) is technically a "breaking" change that'd affect any automated processes (once they version bump at least). Technically since QSV is still on
0.x.y
, wheneverx
increases breaking changes are permitted, so that is your call πGoReleaser could help with the archive by platform difference if you were to pursue this change.
UPDATE: I see that your CI is reliant upon the release tag in the asset name, so that may be a blocker for dropping the version name:
https://github.com/jqnatividad/qsv/blob/2652d76504a70a9856e7010bd482ce73cbac9dba/.github/workflows/publish-linux-qsvpy-glibc-231-musl-123.yml#L166-L172
BTW, you could probably simplify the build process a bit (especially for the lower glibc requirement and musl build support) by using Zig (see
cargo-zigbuild
).