Open smoser opened 1 year ago
Just for ease of viewing, i'll describe what 'recreate.sh' does.
It basically just loops over a build of the following stacker.yaml
defining NUMBER each time so that 'rootfs' is forced to be built.
It may be relevant that 'docker://busybox:latest' is initially a tar layer that gets converted to squashfs by stacker.
stacker.yaml:
minbase:
build_only: true
from:
type: docker
url: docker://busybox:latest
run: |
echo hello > /minbase.txt
rootfs:
from:
type: built
tag: minbase
run: |
n=${{NUMBER}}
[ -e /minbase.txt ] && echo "run $n good" ||
{ echo "run $n bad"; exit 1; }
And then:
n=0
while [ $n -lt 50 ] && n=$((n+1)); do
stacker build --substitute=NUMBER=$n || exit
done
@hallyn , did you think is fixed by #454 ? if so, can we validate that and close?
It doesn't fix it. It fails after a random number of iterations - my last attempt hit
+ echo 'run 23 bad'
run 23 bad
https://pastebin.com/PM24dtr4 shows the backtrace.
Perhaps this failure is due to using fuse for atomfs without the mount-is-ready notification channel.
Perhaps this failure is due to using fuse for atomfs without the mount-is-ready notification channel.
its not. it is golang dict ordering.
stacker version
v1.0.0-rc4-8e267fc
Describe the bug
This issue was first described in #431 We made a valid fix there, but but it did not fix the issue here.
When using
build_only: true
for as under-layers stacker can fail to setup a valid container. The fact that the original docker layer was a 'tar' layer is also likely related.The following comment string in the beginning of lxcRootfsString in pkg/overlay/metadata.go here is not correct for all use cases:
lxcRootfsString will ovl.Manifests dictionary and pick the first manifest it finds. In the case where stacker is only building squashfs a stacker file like below will fail if the dictionary traversal does not select 'squash+true' first.
The problem can be seen when reading the serialized overlay_metadata.json in roots/minbase/overlay_metadata.json the 'tar+false' entry is missing a layer (it has only 1, where the squashfs+true entry has 2). The file below is trimmed.
To reproduce
The attached recreate.sh will reproduce the bug.
It reads the following environment variables:
Changing the value of BUILD_ONLY to 'false' or LAYER_TYPES to 'squashfs,tar' (or 'tar,squashfs') will cause the issue to not reproduce.
The problem only occurs with stacker files that have 'build_only: true' and are built '--layer-type=squashfs'.
Additional context
My bootkit project builds artifacts using stacker. It organizes these artifacts into a few layers that are to be published. It heavily uses 'build_only: true' and uses 'stacker publish' to publish the layers.
Due the this bug bootkit c-i build sees transient failures.
My options to avoid the bug are:
stacker publish --layer-type=squashfs
).stacker publish --layer=x --layer=y...
)Both of these options will incur a lot of extra cpu and io and the second one requires maintaining a list of what to publish in some place other than stacker.yaml