pypa / cibuildwheel

🎡 Build Python wheels for all the platforms with minimal configuration.
https://cibuildwheel.pypa.io
Other
1.85k stars 237 forks source link

Is there a way to extract non-wheel outputs from a cibuildwheel Docker container? #1030

Open jbarlow83 opened 2 years ago

jbarlow83 commented 2 years ago

Description

As far as I can tell, cibuildwheel is only prepared to copy wheels back to the host and has no room for exporting side channel shenanigans like I'm considering. Is that true, or is there are a supported way to copy non-wheel objects to the host?

Context: Exploring options to generate manylinux-aarch64 wheels for pikepdf using cibw, running an QEMU-emulated Docker container for aarch64. The main problem with this case, as observed in other open issue tickets, is that emulation is really really slow.

Cross compiling seems to be a dead end and can be quite tricky with third party libraries whose configure-make scripts don't always contain plans for crosscompiling.

I think a fairly straightforward (for some generous definition of "fairly" and "straightforward") option would be to wire up ccache to speed up builds. The problem is that the cache needs to be copied from the Docker container back to the host (which would presumably use its CI runner to save and restore the cache). An alternative would be to copy precompiled binaries in with way to check if the cache is stale.

Copying data into the host isn't that much of an issue - the /project folder is passed to the container and could contain a cache folder.

joerick commented 2 years ago

Yeah, this seems reasonable. We've previously discussed https://github.com/pypa/cibuildwheel/issues/363 some sort of specific extraction of files from the build as well, for example, coverage reports. But the API design in that issue is about copying out artifacts from each build into a separate folder. For a cache, you probably don't want the files to move around, and perhaps you also want to share them between every build (and even build architecture?).

So perhaps something closer to a mount is suitable? e.g. CIBW_DOCKER_CACHE_DIR={project}/cache would copy in ./cache to /project/cache at the start of the build e.g, then copy out and replace ./cache at the end of the container.

I suppose the question is whether such an API would actually be sufficient for the use cases in #363 as well. In which case a more general name for the option might be appropriate. In any case, I'd be curious to hear your opinion on this @jbarlow83, does this sound like it would fit your use case? It's been a long while since I've used ccache!

thomaslima commented 2 years ago

I have the same question. I also want to use ccache. Right now I'm passing a CCACHE_DIR environment variable to the container. It's unclear in the documentation whether the /project is read-only from the perspective of the host. I'm trying with /host/path/to/ccachedir (following https://cibuildwheel.readthedocs.io/en/stable/faq/#linux-builds-on-docker) now to see if it works.

Mause commented 1 year ago

@thomaslima did you end up having any luck with that approach?

thomaslima commented 1 year ago

Hi @Mause, thanks for reaching out. Indeed this approach suggested by @joerick is working for me.

Here's the code I used for step 3 (before_all step): https://github.com/KLayout/klayout/blob/1f2e8b40125518bc50235802753096348998b409/ci-scripts/docker/docker_prepare.sh#L34-L40

Here's the code I used for step 5: https://github.com/KLayout/klayout/blob/1f2e8b40125518bc50235802753096348998b409/.github/workflows/build.yml#L49-L59

Steps 2,6 were done with the hendrikmuhs/ccache-action@v1.2.

Hope this helps. If this issue is worked on, I would like it to implement steps 3 and 5. Would be nice to pass a list of directories that can be serialized and deserialized to and from the docker container.

bmerry commented 1 year ago

While slightly offtopic for the original bug, there seems to be some interest from the participants specifically in caching compilation results, so I'll mention a slightly prettier alternative to @thomaslima's approach: instead of using ccache, use sccache with its GHA integration. That communicates directly with Github's cache storage without using an on-disk cache, which simplifies things a lot.

  1. Use the sccache action and enable GHA in sccache: here
  2. Pass the necessary environment variables into the container: here
  3. Install sccache inside the container: here
  4. Print sccache stats to verify (the action in step 1 also prints a report, but it doesn't seem to capture stats from inside the container): here
mwestphal commented 1 year ago

@bmerry not all links work in the above comment. could you fix that ?

BTW I dont understand how does tha cache ends up in the container in your example ?

bmerry commented 1 year ago

@bmerry not all links work in the above comment. could you fix that ?

Thanks for pointing it out - I've edited the comment and updated the links to point to the latest version.

BTW I dont understand how does tha cache ends up in the container in your example ?

The cache contents aren't in the container; they're in the cloud.

mwestphal commented 1 year ago

The cache contents aren't in the container; they're in the cloud.

Yes, but what mechanism is responsible for downloading them locally ? afaics I need to use a cache action somewhere to do that.

bmerry commented 1 year ago

The cache contents aren't in the container; they're in the cloud.

Yes, but what mechanism is responsible for downloading them locally ? afaics I need to use a cache action somewhere to do that.

sccache is the mechanism. It has Github Actions integration.

mwestphal commented 1 year ago

You mean the sccache action ? ok its clearer, thanks.