hpc / charliecloud

Now hosted on GitLab.
https://gitlab.com/charliecloud/main
Apache License 2.0
313 stars 61 forks source link

ch_fuse: switch SquashFUSE to libfuse3 types #1859

Closed olifre closed 6 months ago

olifre commented 6 months ago

The forget operation in libfuse3 takes uint64_t as third parameter, while SquashFUSE defaults to unsigned long as used in libfuse2. This causes a mess on arches with different size of these types, so explicitly switch to the libfuse3 variant.

closes #1858

@wiene (or someone with armhf / armel hardware): Please let me know if that indeed fixes the problem :crossed_fingers: .

wiene commented 6 months ago

Thanks @olifre for providing this PR. I tried it but unfortunately it did not fix the FTBFS issue on arm{el,hf}. Here are the new build logs:

olifre commented 6 months ago

Thanks for testing, @wiene !

I get the reason why it did not fix the problems on Debian — the necessary counterpart in Squashfuse is only present starting with 0.5.1 (see https://github.com/vasi/squashfuse/commit/cb148fc1477ed676049b7891ebb6efc90b2c00ec ), while Debian unstable is still using 0.5.0.

However, it took me a while to understand how Squashfuse itself can compile on armel / armhf (or anything 32 bit, basically) on Debian.

Debian unstable on 32 bit should run into the very same problem right here: https://salsa.debian.org/sgmoore/squashfuse/-/blob/debian/0.5.0-2/ll_main.c#L164 The left hand side of that assignment comes from: https://sources.debian.org/src/fuse3/3.14.0-5/include/fuse_lowlevel.h/#L281 i.e. with uint64_t type as third argument, the right hand side from here: https://salsa.debian.org/sgmoore/squashfuse/-/blob/debian/0.5.0-2/ll.h#L111-112 i.e. unsigned long as third argument.

Well, it turns out when checking build logs for Squashfuse on Debian armel, for example: https://buildd.debian.org/status/fetch.php?pkg=squashfuse&arch=armel&ver=0.5.0-2%2Bb1&stamp=1707534165&raw=0 that they contain the following:

ll_main.c: In function ‘main’:
ll_main.c:164:41: warning: assignment to ‘void (*)(struct fuse_req *, fuse_ino_t,  uint64_t)’ {aka ‘void (*)(struct fuse_req *, long long unsigned int,  long long unsigned int)’} from incompatible pointer type ‘void (*)(struct fuse_req *, fuse_ino_t,  long unsigned int)’ {aka ‘void (*)(struct fuse_req *, long long unsigned int,  long unsigned int)’} [-Wincompatible-pointer-types]
  164 |         sqfs_ll_ops.forget              = sqfs_ll_op_forget;
      |                                         ^

So indeed they see the warning, but just ignore it / don't compile with -Werror.

So maybe the best way to proceed would be to raise a bug report with squashfuse on Debian unstable, recommend them to use -Werror in packaging if possible to catch such issues, and propose to upgrade to a version > 0.5.0 to fix this warning?

wiene commented 6 months ago

Thanks again for your careful analysis, @olifre. I followed your advice and filed a bug against libsquashfuse-dev: https://bugs.debian.org/1067010

reidpr commented 6 months ago

Charliecloud’s configure does have --enable-buggy-build which turns off -Werror. If this warning does not reflect a true functionality problem, that is a (hopefully temporary) option.

olifre commented 6 months ago

Actually, I'm not fully sure this is harmless — if the forget operation is called with a third argument with wrong size, it may be interpreted as a wrong value, leading to a false "forget" operation of the associated inode.

That is probably really exotic, I assume one would need to do make nlookup exceed unsigned long on 32 bit e.g. by opening a file 2^32 times (if I understand the mechanics correctly), and then trigger forget e.g. by pressurizing memory. If my understanding is correct, this could lead to the FUSE layer forgetting about an inode even though it is still referenced. However, my understanding of FUSE is quite basic.

If this is true, though, it's a harmful bug in Squashfuse, and we should push the Debian maintainer to fix it in testing by upgrading (and ideally also enable -Werror in packaging of Squashfuse, which I would consider good practice for binary distros). Since @wiene has opened the bug against Squashfuse with Severity: serious I expect the maintainer will take action soon.

wiene commented 6 months ago

After applying the patch from this PR and fixing

Charliecloud builds successfully on arm{el,hf}:

Thus this PR is ready to be merged from my point of view.

olifre commented 6 months ago

Thus this PR is ready to be merged from my point of view.

Many thanks, @wiene! I've marked the PR as "ready for review" right now.

reidpr commented 6 months ago

Thank you @olifre and @wiene!!

The test failure looks spurious so I am going to just merge.