Open smcv opened 2 years ago
That's interesting. So, the call to ostree_sysroot_write_deployments
succeeds, but it's either not cleaning up the old deployments or g_file_query_exists
is lying. Or maybe deploymentPath
isn't what's expected?
Pursuing the g_file_query_exists
is lying angle, it appears that GIO uses either statx
or lstat
preferring statx
if it was available at build time. Perhaps 2021.5 is when statx
started being used in GIO and it's flaky on the s390x builder? A way to cross check is to use g_file_test
, which uses access
to test existence. You could try adding this to the test:
diff --git a/tests/test-sysroot.js b/tests/test-sysroot.js
index d4f67ef4..d9a78dc3 100755
--- a/tests/test-sysroot.js
+++ b/tests/test-sysroot.js
@@ -93,6 +93,8 @@ sysroot.write_deployments([], null);
print("OK empty deployments");
+print("Deployment path: " + deploymentPath.get_path());
+assertEquals(GLib.file_test(deploymentPath.get_path(), GLib.FileTest.EXISTS), false);
assertEquals(deploymentPath.query_exists(null), false);
//// Ok, redeploy, then add a new revision upstream and pull it
And here's a hack to get a little more info about cleaning up deployments:
diff --git a/src/libostree/ostree-sysroot-cleanup.c b/src/libostree/ostree-sysroot-cleanup.c
index 3471cac7..9ca7fcc6 100644
--- a/src/libostree/ostree-sysroot-cleanup.c
+++ b/src/libostree/ostree-sysroot-cleanup.c
@@ -325,8 +325,12 @@ cleanup_old_deployments (OstreeSysroot *self,
g_autofree char *deployment_path = ostree_sysroot_get_deployment_dirpath (self, deployment);
if (g_hash_table_lookup (active_deployment_dirs, deployment_path))
- continue;
+ {
+ g_print ("Skipping cleanup of active deployment %s\n", deployment_path);
+ continue;
+ }
+ g_print ("Cleaning up deployment %s\n", deployment_path);
if (!_ostree_sysroot_rmrf_deployment (self, deployment, cancellable, error))
return FALSE;
}
Debian is currently rebuilding half the archive to recover from a binutils regression, so I am probably not going to be able to test this until the autobuilders recover, sorry.
Because this is intermittent, I can't know that a successful build is really a success, and because the autobuilders are production infrastructure, I can't just keep hitting rebuild. I'll try doing manual builds on a s390x "porter box" when I get a chance, but there's no guarantee that that will match the autobuilder's behaviour.
Perhaps 2021.5 is when statx started being used in GIO and it's flaky on the s390x builder?
Use of statx seems to have been new in 2.66.x, and we had several consecutive successful builds of ostree on s390x after 2.66.x was introduced, so I think it's probably not that... but because it's intermittent, I can't be sure.
That's interesting. So, the call to ostree_sysroot_write_deployments succeeds, but it's either not cleaning up the old deployments or g_file_query_exists is lying. Or maybe deploymentPath isn't what's expected?
In some older builds, like 2021.1-1, we seem to have had other tests failing when they asserted that a directory should not exist, but it did - and those assertions were in shell scripts using test -d
, so probably not statx
? (But I don't know, maybe bash genuinely does use statx
for builtins.)
Oh, I didn't mean to try to debug it right now. I can imagine that s390x debugging is nowhere near the top of your queue. Just that if you do get around to it, it would be helpful to try to narrow down the issue.
Hi all, i've tried make && make check
many times on Fedora35:
Linux 5.15.17-200.fc35.s390x
and wasn't able to reproduce the issue:
PASS: tests/test-sysroot.js 1 test-sysroot
I was unable to reproduce this on the Debian-developer-accessible s390x that is meant to be the closest thing there is to being able to access an autobuilder interactively (build + tests succeeded in 2/2 attempts), but 2022.7 failed in this way on 3/3 attempts on Debian's official s390x autobuilders, so there might be something about Debian's official autobuilder infrastructure that makes this test more likely to fail.
My ability to debug that is extremely limited, because only sysadmins have any sort of interactive access to the autobuilder machines, so this is unlikely to go further without someone else picking this up.
The unit test
tests/test-sysroot.js
seems to be intermittently failing on Debian's s390x port since October (2021.5). It doesn't always fail, and after failing, it consistently succeeds when the build is retried.This is happening in a transient chroot environment on autobuilders that are not accessible to ordinary Debian developers, so I am unable to get any information about the failed builds beyond what's in the logs.
The failing assertion is this one:
I have never had any success with taking s390x-specific issues to Debian's s390x architecture porting team (which might in fact not contain any people), but I hear several ostree developers now work for an IBM subsidiary, so perhaps someone there is better-placed than me to know about s390x-specific issues or see whether this is reproducible in a development environment?
We've seen this with gjs 1.68.4 and 1.70.0. Full logs for some recent versions: 2022.1, 2021.6