unit tests can't be run from the crate downloaded from crates.io

alexanderkjall commented 2 years ago

This is more of a 'for your information' rather than a bug report about something that is wrong in the project. But I thought it wouldn't hurt to inform upstream about it :)

The unit tests depend on a file named /benches/binary-8-msb.lzw that isn't included in the crate uploaded to crates.io.

test output:

error: couldn't read /tmp/r/weezl-0.1.5/benches/binary-8-msb.lzw: No such file or directory (os error 2)
    --> src/decode.rs:1240:37
     |
1240 |           const FILE: &'static [u8] = include_bytes!(concat!(
     |  _____________________________________^
1241 | |             env!("CARGO_MANIFEST_DIR"),
1242 | |             "/benches/binary-8-msb.lzw"
1243 | |         ));
     | |__________^
     |
     = note: this error originates in the macro `include_bytes` (in Nightly builds, run with -Z macro-backtrace for more info)

error: could not compile `weezl` due to previous error
warning: build failed, waiting for other jobs to finish...
error: build failed

This means that we have to disable to tests when packaging this crate for debian. Would it be possible to include the /benches/binary-8-msb.lzw file in the next release?

HeroicKatora commented 2 years ago

We didn't plan to include binary files in crate release to keep them light for the typical consumption—compilation as a dependency. Unfortunately, this conflicts with the dev-integration as a testable crate based on the archives alone. This is the experience across all image crates, and, admitedly, part of the motivation to keep this choice even for rather small test units was to make this problem more apparent.

Based on the png problems, I did some prototyping on a dev-dependency that would provide the proper semantics–as far as I'm concerned. Can you provide some feedback for this approach? I have little experience with Debian packaging, and didn't get enough.

Crate: https://github.com/HeroicKatora/xtest-data
Prototype with png: https://github.com/image-rs/image-png/pull/304

I'm glad to provide any additional guidance/dialogue (here, Matrix, Discord or e-mail me).

alexanderkjall commented 2 years ago

I like the problem statement of xtest-data, but the debian build system doesn't allow network access during builds.

The base premise is that debian should be buildable from the debian sources, in order to fulfill a number of guarantees

that the source code that generated the binary is actually the open source code in the package and no non-open source code is introduced.
to have traceability and repeatability of how a binary was generated.
to protect the build systems from attacks.

It's maybe possible to solve this in some other way with some trick with an optional dev-dependency that contains the test files, but I don't really know if that is realistic.

I might be able to patch the binary files back in as debian specific patches, it would maybe be a bit more fragile and more work when the package needs to be updated.

HeroicKatora commented 2 years ago

[…] but the debian build system doesn't allow network access during builds.

Is it possible to have a shallow git checkout of the repo data? Because the repository URL is merely a first guess, you can customize it to a local repository with CARGO_XTEST_DATA_REPOSITORY_ORIGIN.

fintelia commented 2 years ago

I think the dev-dependency approach could work. Have a second lzw-testcases crate as a subdirectory of this repository that is included either via a path dependency (for git checkouts) and also via crates.io. I'd expect that Debian already has the infrastructure to handle this sort of setup?

HeroicKatora commented 2 years ago

My problem with lzw-testcases as a crate is that even uploading to crates.io is rather expensive (to their infrastructure) and feels wrong. Not to mention that they have arbitrary limits that will not scale to the test set we want for image crates:

crates.io currently has a 10MB size limit on the .crate file. (https://doc.rust-lang.org/cargo/reference/publishing.html#packaging-a-crate)

In any case I don't see a major difference packing a commit and packing the archive, however the former can re-use git's architecture for the checksumming that I would prefer to have for superior reproducibility.

alexanderkjall commented 2 years ago

The main reason that we base the packaging on the artifacts in crates.io is that those are the published versions, and that gives us more of an guarantee that the code that is packaged is the same code that other people are also using.

Other rust tooling, like the security advisory database (https://rustsec.org/) is also basing it's work on the crates that are distributed from crates.io. and it's nice to be able to reuse that work.

It's possible to package rust code in debian without using all our rust packaging infrastructure, and then it would be possible to package it based on a git repo i guess. But that would leave the package as it's own special snowflake that needs special handling for everything and personally I'm not really willing to increase the workload for the team just to get a couple of more tests working.

I think a better solution might be to add those files as patches to the original package in debian.

HeroicKatora commented 2 years ago

@alexanderkjall As a specific suggestion, let's say we export test data as a git-pack-objects archive. Would such a second file, auxiliary to the .crate, be acceptable for the Debian package sources?

It's trivial to automatically upload it to Github, as well as associate it with specific version tags in CI. A small bit of work would be to integrate it with the infrastructure of xtest-data. That is, automatically create the minimal pack-archive, and allow importing from a local pack archive file instead cloning from a git remote/shallow clone. But then you just have one more source file/archive that one can handle fully independent from .crate data.

alexanderkjall commented 2 years ago

We don't really have any support tooling around adding that kind of extra files, other than the patch system that adds patches, so I feel that it might be a bit premature for me to have any opinion on what system would be the easiest one :)

Sorry for not having a clearer answer, but i could come back to this topic once we have some more support infrastructure in place.

HeroicKatora commented 2 years ago

I've prototyped that system. Basically, just supplement the usual git fetch by a precalculated object-pack. Then tests can import that object pack from disk. This mode no longer requires explicit confirmation because it does not access the network in the first place (it's an alternative). See https://github.com/HeroicKatora/xtest-data/pull/2

HeroicKatora commented 2 years ago

If you want to test integration in your tooling, grab:

https://crates.io/crates/xtest-data/1.0.0-beta.2
https://github.com/HeroicKatora/xtest-data/releases/download/v1.0.0-beta.2/xtest-data.tar.gz
tar xzf $src/xtest-data-1.0.0-beta.2.crate, cd xtest-data-1.0.0-beta.2, tar xzf $src/xtest-data.tar.gz
TMPDIR=/tmp CARGO_XTEST_DATA_PACK_OBJECTS="$(pwd)/target/xtest-data" cargo test

image-rs / weezl

unit tests can't be run from the crate downloaded from crates.io #29