Open owtaylor opened 3 months ago
is this somehow specific to using prefetch? I haven't had any of these concurrency problems in my multi-arch pipelines, but I usually do not have prefetch turned there.
is this somehow specific to using prefetch? I haven't had any of these concurrency problems in my multi-arch pipelines, but I usually do not have prefetch turned there.
My guess is that your multi-arch pipelines use separate workspaces. I only hit this because I wasn't following a well tested example. The downside of using separate workspaces (other than complexity) is prefetch specific, and especially when the prefetch is large.
Maybe the fix is to move to using Trusted Artifacts?
Maybe the fix is to move to using Trusted Artifacts?
From discussion with @arewm and @brianwcook yesterday, it sounded like they thought keeping the non-oci-ta variants around was going to be necessary for projects with "very large" artifacts.
Certainly for the flatpak-runtime / flatpak-sdk images I'm working on, I have some doubts about the performance and storage usage of putting the prefetched RPMs (> 10GB for all architectures + source) into a registry artifact, though I haven't tried it yet.
Ah, I am pretty much exclusively using trusted artifacts so that makes sense.
Maybe the fix is to move to using Trusted Artifacts?
From discussion with @arewm and @brianwcook yesterday, it sounded like they thought keeping the non-oci-ta variants around was going to be necessary for projects with "very large" artifacts.
Certainly for the flatpak-runtime / flatpak-sdk images I'm working on, I have some doubts about the performance and storage usage of putting the prefetched RPMs (> 10GB for all architectures + source) into a registry artifact, though I haven't tried it yet.
so 10GB / 4 Arch = 2.5GB per arch ... I think there is a reasonable chance you will have a better experience using trusteed artifacts instead of PVC workspaces.
@arewm will the Matrix feature be able to reduce the complexity of using multiple prefetch tasks & workspaces?
so 10GB / 4 Arch = 2.5GB per arch ... I think there is a reasonable chance you will have a better experience using trusteed artifacts instead of PVC workspaces.
If we can't split the prefetch[^1] by arch and source then a single PVC workspace is clearly way better than downloading 10GB 4 times to use 2.5G of it. If we can split it, then both the Trusted Artifact and PVC versions benefit from that - the PVC version will still avoid pushing and pulling 10GB to the registry and leaving it there indefinitely.
Anyways, I filed an issue here to publicly record the issue and a possible fix. Could also do something like:
diff --git a/task/buildah-remote/0.2/buildah-remote.yaml b/task/buildah-remote/0.2/buildah-remote.yaml
index 2c5ae78..7c793c5 100644
--- a/task/buildah-remote/0.2/buildah-remote.yaml
+++ b/task/buildah-remote/0.2/buildah-remote.yaml
@@ -447,6 +447,10 @@ spec:
-v "$BUILD_DIR/tekton-results/:/tekton/results:Z" \
-v $BUILD_DIR/scripts:/script:Z \
--user=0 --rm "$BUILDER_IMAGE" /script/script-build.sh
+ if [ -d rhtap-final-image ] ; then
+ echo "rhtap-final-image already exists, multiple buildah-remote tasks must use separate workspaces"
+ exit 1
+ fi
rsync -ra "$SSH_HOST:$BUILD_DIR/workspaces/source/" "$(workspaces.source.path)/"
rsync -ra "$SSH_HOST:$BUILD_DIR/volumes/shared/" /shared/
rsync -ra "$SSH_HOST:$BUILD_DIR/tekton-results/" "/tekton/results/"
[^1]: Splitting a RPM prefetch should be doable without any cachi2 changes, though clumsy, by using separate rpms.lock.yaml
in separate subdirectories of the source. Haven't tested that yet.
Matrix builds are defined by setting parameters for task runs so I don't think that it would support associating specific workspaces for each of the runs. The challenge here is that the workspace exists outside of just the taskrun as you need to specify each of the shared workspaces for the entire pipeline run.
It seems like it could be reasonable to use a matrix for prefetching as well as builds using TA. If the OCI trusted artifact is pushed with some architecture specification in the prefetch then it should be possible for the consumer to identify the proper arch match for its needs.
I have heard that it was an intentional decision to have a single prefetch task in the pipeline and not farm it out to multiple arch-specific ones. I am not familiar with the specifics of that decision, however.
@brunoapimentel, are you aware or do you know who would be able to shed light on the current implementation?
I don't know why it is the way it is either, but I am really curious to see how much stuff breaks if we did do per-arch prefetch so I added it to a fork of cachi2
https://github.com/brianwcook/cachi2/commit/707a7d053ac6c28627fd55e0c475058572802ec6
I don't know why it is the way it is either, but I am really curious to see how much stuff breaks if we did do per-arch prefetch so I added it to a fork of cachi2
Probably just source containers (they won't break, but they'll potentially be missing sources)
And for everything except RPMs, running prefetch more than once is a waste
I don't know why it is the way it is either, but I am really curious to see how much stuff breaks if we did do per-arch prefetch so I added it to a fork of cachi2
https://github.com/brianwcook/cachi2/commit/707a7d053ac6c28627fd55e0c475058572802ec6
Cool. I think it might make more sense to do it as a command line option, so the input definition can be shared among tasks. When I prototyped implementing split prefetch using separate rpms.lock.in in subdirs, it was quite awkward to use a different input per prefetch - I could no longer just define the prefetch input for the entire pipeline - there's no way I know to say "Use params.prefetch-input subsituting @ARCH@ with $arch".
Probably just source containers (they won't break, but they'll potentially be missing sources)
I think you'd do the source prefetch separately - either by having cachi2 merge all arches[].source
or by changing the rpms.lock.yaml
format so that source packages were listed separately from the architectures.
And for everything except RPMs, running prefetch more than once is a waste
There are probably sticky cases where you want to download a lot of RPMs, and a lot of something else, but the worst case for RPM prefetch is something like a base image where you have gigabytes of RPMs and nothing else. I'll collect stats for size and performance when I finish Flatpak runtime and SDK and post that somewhere - may give some idea of when split prefetch might be useful, if at all.
I did add it as a command line option (cachi2 taks a json payload on the cli).
I'm not sure if it makes more sense to have 4 prefetch tasks, or to have 1 prefetch task that produces a cache which is well separated so that it can be transferred separately (maybe this wouldn't break source containers, but it is definitely much more complicated to implement).
--
BRIAN COOK (he/him/his)
Sr. Principal Product Manager
Konflux & RHT Product Build Systems
On Mon, Aug 12, 2024 at 9:22 AM Owen Taylor @.***> wrote:
I don't know why it is the way it is either, but I am really curious to see how much stuff breaks if we did do per-arch prefetch so I added it to a fork of cachi2
@.*** https://github.com/brianwcook/cachi2/commit/707a7d053ac6c28627fd55e0c475058572802ec6
Cool. I think it might make more sense to do it as a command line option, so the input definition can be shared among tasks. When I prototyped implementing split prefetch using separate rpms.lock.in in subdirs, it was quite awkward to use a different input per prefetch - I could no longer just define the prefetch input for the entire pipeline - there's no way I know to say "Use params.prefetch-input subsituting @arch https://github.com/arch@ with $arch".
Probably just source containers (they won't break, but they'll potentially be missing sources)
I think you'd do the source prefetch separately - either by having cachi2 merge all arches[].source or by changing the rpms.lock.yaml format so that source packages were listed separately from the architectures.
And for everything except RPMs, running prefetch more than once is a waste
There are probably sticky cases where you want to download a lot of RPMs, and a lot of something else, but the worst case for RPM prefetch is something like a base image where you have gigabytes of RPMs and nothing else. I'll collect stats for size and performance when I finish Flatpak runtime and SDK and post that somewhere - may give some idea of when split prefetch might be useful, if at all.
— Reply to this email directly, view it on GitHub https://github.com/konflux-ci/build-definitions/issues/1275#issuecomment-2283980010, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4DQJ3PT6V3EHQ47JUL37DZRCZI5AVCNFSM6AAAAABMIT5UTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBTHE4DAMBRGA . You are receiving this because you were mentioned.Message ID: @.***>
Here's some timings for the two Flatpak images we produce the runtime (common dependencies) and SDK (extended runtime with compilers, headers, etc.)
Size (MiB) | Prefetch (s) | Create/Upload (s) | Restore (s) | rsync (s) | |
---|---|---|---|---|---|
flatpak-runtime - amd64 | 295 | 40 | 35 | 9 | 4 |
flatpak-runtime - arm64 | 271 | 40 | 30 | 7 | 2 |
flatpak-runtime - ppc64le | 281 | 44 | 31 | 7 | 6 |
flatpak-runtime - s390x | 285 | 33 | 24 | 9 | 6 |
flatpak-runtime - source | 2719 | 130 | 257 | 42 | --- |
Total: | 3851 | 287 | 377 | 74 | 18 |
flatpak-sdk - amd64 | 572 | 545 | 61 | 18 | 4 |
flatpak-sdk - arm64 | 536 | 528 | 50 | 20 | 3 |
flatpak-sdk - ppc64le | 546 | 528 | 52 | 25 | 11 |
flatpak-sdk - s390x | 544 | 530 | 48 | 13 | 11 |
flatpak-sdk - source | 3552 | 888 | 336 | 159 | --- |
Total: | 5750 | 3019 | 547 | 235 | 29 |
The timings are a bit all over the place - especially for the prefetch step. (The network connection between the Konflux cluster and the internal repositories prefetch is pulling from seems to be the bottleneck.) But it gives a a general idea.
My observations:
Cool. I think it might make more sense to do it as a command line option, so the input definition can be shared among tasks. When I prototyped implementing split prefetch using separate rpms.lock.in in subdirs, it was quite awkward to use a different input per prefetch - I could no longer just define the prefetch input for the entire pipeline - there's no way I know to say "Use params.prefetch-input subsituting https://github.com/arch@ with $arch". I did add it as a command line option (cachi2 taks a json payload on the cli).
Well, OK :-) , but what I mean is a command line option that's separate from the input specification, since the input specification is something that the component should be able to control without modifying individual pipeline taskruns.
I'm not sure if it makes more sense to have 4 prefetch tasks, or to have 1 prefetch task that produces a cache which is well separated so that it can be transferred separately (maybe this wouldn't break source containers, but it is definitely much more complicated to implement).
There are two goals that are at odds with each other:
I haven't looked into how the directory is structured when prefetching content, but a potential high-level overview of the required changes is:
@mmorhun @zregvart , fyi as well to bring your attention to this discussion.
Reducing the amount of content that is pulled by the build container if prefetches are arch-specific (i.e. multiple artifacts)
- Ensuring that all prefetched content is present for generating the source container (i.e. a single artifact)
Not sure why we could not generate a prefetch trusted artifact per-platform? A single source and platform specific trusted artifact can be used in per-platform build step, and restoring the single source and multiple per-platform trusted artifacts when building the source container. This all relies on destinations for per-platform prefetch trusted artifacts not overwriting each other, i.e. being created and restored to platform-specific directories.
Reducing the amount of content that is pulled by the build container if prefetches are arch-specific (i.e. multiple artifacts)
- Ensuring that all prefetched content is present for generating the source container (i.e. a single artifact)
Not sure why we could not generate a prefetch trusted artifact per-platform? A single source and platform specific trusted artifact can be used in per-platform build step, and restoring the single source and multiple per-platform trusted artifacts when building the source container. This all relies on destinations for per-platform prefetch trusted artifacts not overwriting each other, i.e. being created and restored to platform-specific directories.
That does sound like a better idea. But if it also needs to work for the non-TA pipeline, it complicates things.
Most multiarch pipelines I've seen use one workspace per arch and run one prefetch per arch (I don't fully understand why, but I think it was due to the volumes being ReadWriteOnce - not possible to run builds in parallel with a single workspace). Which would mean the source-container task needs to mount a variable number of workspaces.
In practice, the way to achieve this would be to declare a hardcoded number of workspaces in the source task (let's say 10) and Konflux users would have to pass their per-arch workspaces explicitly, something like this
- name: workspace-0
workspace: workspace-x86_64
- name: workspace-1
workspace: workspace-aarch64
- name: workspace-2
workspace: workspace-ppc64le
- name: workspace-3
workspace: workspace-s390x
Side note: the only reason why this is currently not necessary is that it's almost impossible to make the prefetch task behave differently per arch, except for Owen's workaround:
Splitting a RPM prefetch should be doable without any cachi2 changes, though clumsy, by using separate rpms.lock.yaml in separate subdirectories of the source.
Most of the multi-arch pipelines have a single PVC per workspace to prevent the current issue ... preventing the unnecessary copying/syncing of data to remote workers for the other architectures.
In my proposed pipeline for multi-arch with matrix (and OCI trusted artifacts), none of these workspaces exist: https://github.com/konflux-ci/build-definitions/pull/1236.
While the requested feature to split the prefetch can be done when the data is stored in PVC, there is a more pressing need for it just in the OCI trusted artifacts because of the additional data transfer to/from the registry (this cost is not realized with the PVC except for when we rsync data to the remote host).
I kind of lost track of what we're discussing in this issue
I also lost track of the issue we're discussing here and the discussion itself seems to be quiet for a long time. Is it possible that the issue is not relevant anymore with moving to Trusted artifacts pipeline? I'll close it in a week if there won't be any response.
I think that the original issue that was described (i.e. not having uniqueness in the images when synced back to a shared PVC) might have been incidentally/unintentionally resolved with https://github.com/konflux-ci/build-definitions/pull/1216.
We now push and pull the image which automatically has the platform appended to it.
The resulting conversation is unresolved ... i.e. how should we properly handle/design prefetching large payloads for multiple architectures.
Should we have a separate Feature / design to solve the large payloads issue? It's confusing from the issue title.
With the buildah-remote task, there are problems if several are run at the same time with the same workspace. For example, I saw:
$(workspaces.source.path)/rhtap-final-image
is rsynced back from the arch1 remoterhtap-final-image
is rsynced to the arch2 remoterhtap-final-image
rhtap-final-image
is rsynced back from the arch2 remote - with both architecturesbuildah pull oci:rhtap-final-image
fails because the directory now has two images and buildah doesn't know which to useBut there are all sorts of other failures and corruptions that can occur. The workaround people have used for this is to use separate workspaces and duplicate the prefetch-dependencies steps. That works, with the downsides:
As far as I can tell, this could be fixed pretty simply by making the buildah/buildah-remote tasks align more closely with the buildah-remote-oci-tasks - use an emptydir workdir for temporary files rather than doing that in the workspace.