pgcentralfoundation / pgrx

Build Postgres Extensions with Rust!
Other
3.58k stars 243 forks source link

Build failed on ppc64el #1897

Open tucnak opened 4 days ago

tucnak commented 4 days ago

I know ppc64el is probably not natively supported by this library, however I'd tried building it regardless as both Rust as well as Postgres support it pretty well, however I'd ran into the following issue when building one of the extensions I'm interested in. Even though admittedly it seems cee-scape is to blame, I wonder if it's possible to build without it.

Is there any reason in particular why inline assembly is required here?

Related #1144

error: cannot find macro `maybesig_setjmp_asm` in this scope
   --> /home/badt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cee-scape-0.2.0/src/asm_based.rs:164:9
    |
164 |         maybesig_setjmp_asm!(setjmp, jbuf_ptr, unused_savemask, closure_env_ptr, c2r, ret);
    |         ^^^^^^^^^^^^^^^^^^^

error: cannot find macro `maybesig_setjmp_asm` in this scope
   --> /home/badt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cee-scape-0.2.0/src/asm_based.rs:215:9
    |
215 |         maybesig_setjmp_asm!(sigsetjmp, jbuf_ptr, savemask, closure_env_ptr, c2r, ret);
    |         ^^^^^^^^^^^^^^^^^^^

error[E0425]: cannot find value `JMP_BUF_SIZE` in this scope
 --> /home/badt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cee-scape-0.2.0/src/macos_compat.rs:8:33
  |
8 | const SIG_JMP_BUF_SIZE: usize = JMP_BUF_SIZE + 1;
  |                                 ^^^^^^^^^^^^ not found in this scope

error[E0425]: cannot find value `JMP_BUF_SIZE` in this scope
  --> /home/badt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cee-scape-0.2.0/src/macos_compat.rs:14:17
   |
8  | const SIG_JMP_BUF_SIZE: usize = JMP_BUF_SIZE + 1;
   | ------------------------------------------------- similarly named constant `SIG_JMP_BUF_SIZE` defined here
...
14 |     _buf: [u32; JMP_BUF_SIZE],
   |                 ^^^^^^^^^^^^ help: a constant with a similar name exists: `SIG_JMP_BUF_SIZE`

For more information about this error, try `rustc --explain E0425`.
error: could not compile `cee-scape` (lib) due to 4 previous errors
warning: build failed, waiting for other jobs to finish...
eeeebbbbrrrr commented 4 days ago

Can you provide us with a PR to fix this? AFAIK, none of the core team has access to this kind of hardware. Ideally, PRing ceescape would be the answer we'd want to see.

tucnak commented 4 days ago

I have so far managed to mitigate this by reverting the commit in question, however when it comes to ppc64el assembly, I simply don't know it. To add insult to injury, I'm not really fluent in Rust. I could patch the commit further by adding a special case for ppc64el, whereas if it's building—just don't do the things you're doing. I looked through the commit, it seems additive in nature, so simply not doing these things should fine?

I'm not sure if that's desirable.

To add insult to injury, Github doesn't even support self-hosted CI runners on POWER so even though I'd happily provide a VM for this purpose, you wouldn't have a way to integrate it with Actions. In other words, I think this is a bit of fringe case all things considered (even though there's a bunch of IBM POWER systems in the wild.)

Is portability otherwise important to you?

tucnak commented 4 days ago

I've since ran into a different issue, pertaining to pgrx-pg-sys bindgen, which boils down to keyword error: unknown target triple 'unknown' and I'm not sure what's it about exactly? The target triplet? It's unclear to me how powerpc64le-unknown-linux-gnu is different from aarch64-unknown-linux-gnu or any other target for that matter in this sense, I believe there's a possibility this may be a bug in its own right?

--- stderr
  build_paths=BuildPaths { manifest_dir: "/home/badt/src/pgrx/pgrx-pg-sys", out_dir: "/home/badt/src/pgvecto.rs/target/release/build/pgrx-pg-sys-042e7b531a5c19eb/out", src_dir: "/home/badt/src/pgrx/pgrx-pg-sys/src/include", shim_src: "/home/badt/src/pgrx/pgrx-pg-sys/pgrx-cshim.c", shim_dst: "/home/badt/src/pgvecto.rs/target/release/build/pgrx-pg-sys-042e7b531a5c19eb/out/pgrx-cshim.c" }
  Generating bindings for pg17
  pg_config --configure CLANG = None
  Bindgen found clang version 20.0.0git (https://github.com/llvm/llvm-project.git 2a2c35a9a652ba8562884ec76008979c761df207)
  found libclang at /usr/local/llvm/lib/libclang.so.20.0.0git
  Found include dirs ["/usr/local/llvm/lib/clang/20/include"]
  error: unknown target triple 'unknown'
  thread '<unnamed>' panicked at /home/badt/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bindgen-0.70.1/ir/context.rs:561:15:
  libclang error; possible causes include:
  - Invalid flag syntax
  - Unrecognized flags
  - Invalid flag arguments
  - File I/O errors
  - Host vs. target architecture mismatch
  If you encounter an error missing from this list, please file an issue or a PR!
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
  thread 'main' panicked at /home/badt/src/pgrx/pgrx-bindgen/src/build.rs:247:41:
  thread panicked while generating bindings: Any { .. }

Upd: I have tracked the issue down to the fact that the latest clang builds PowerPC target with triple set to "unknown", which is super weird. I had to use clang 16 instead for the time being...

eeeebbbbrrrr commented 4 days ago

Is portability otherwise important to you?

Well. It's clearly important to you, which makes it important to us. But we don't have very stringent practices around ensuring portability because, well, as you pointed out, it's not exactly easy to test every platform everywhere.

If you'd want to put up a PR to "do it the old way" for your platform, we'll review it and merge it.

I've since ran into a different issue, pertaining to pgrx-pg-sys bindgen

I don't know what this is, but it's likely our fault. We've had some changes to make our bindgen process more robust (no really, I mean it!) and maybe it still needs a little work. @usamoi might have insight on this particular problem.

usamoi commented 4 days ago

The error cannot be reproduced on my machine.

These are my steps:

Add use_c_to_interface_with_setjmp feature to cee-scape and use a patch of cee-scape.

cee-scape = { version = "0.2", features = ["use_c_to_interface_with_setjmp"] }
[patch.crates-io]
cee-scape = { path = "/usamoi/repos/cee-scape" }
#[cfg(target_arch = "aarch64")]
const JMP_BUF_SIZE: usize = 48;
#[cfg(target_arch = "x86_64")]
const JMP_BUF_SIZE: usize = 37;
#[cfg(target_arch = "powerpc64")]
const JMP_BUF_SIZE: usize = 100; // I don't know the value, but I only care if it compiles

Cross compiling with cargo-zigbuild.

PGRX_BINDGEN_NO_DETECT_INCLUDES=1 CC=clang cargo zigbuild --lib --features "pg16" --target powerpc64le-unknown-linux-gnu

To find out why, you might need to give more details.

workingjubilee commented 4 days ago

@tucnak The reason that we need cee-scape is that Rust has formally rejected the notion that any function can return twice. Even LLVM has no special knowledge of the name of "sigsetjmp" or "setjmp", only that some functions can return twice.

Thus, when you call sigsetjmp or setjmp from Rust, Rust doesn't believe that you called a function that can return twice. The Rust compiler refuses to annotate it with the required LLVMIR annotation. Thus LLVM sees a call to sigsetjmp that it believes cannot return twice... it's just an ordinary function that someone named "sigsetjmp" because they have a sense of humor. What LLVM does next is between itself and whatever gods that compilers believe in, and we have no more voice in such a matter.

cee-scape exposes two cheats around this:

workingjubilee commented 4 days ago

Using cee-scape's C-compiler-based interface should be fine for any target that does not have assembly support in cee-scape. Someone might have to PR a new entry for JMP_BUF_SIZE, as usamoi diffed there, but that should be easy. It makes cross-compiling a pain, but I think that's okay. That's the smallest PR to make.

tucnak commented 4 days ago

@eeeebbbbrrrr It means a lot, thank you! I'll happily do a PR once I can figure out a reasonable way to address this.

@workingjubilee Thank you for explanation, it's quite illuminating. The fact that Rust refuses to recognise the reality of setjmp is weird indeed. I have followed @usamoi instructions and used a simple program to figure out the appropriate value for JMP_BUF_SIZE as follows, hopefully it's correct:

#include <stdio.h>
#include <setjmp.h>
int main() {
    printf("%zu\n", sizeof(jmp_buf));
    return 0;
}

In this case, turns out the value is 656. I downloaded cee-scape and vendored it in successfully, however there's one more snag: because I'm not building pgrx in vacuum, and in fact building pgvecto.rs which is the target extension, there's no clean way to set use_c_to_interface_with_setjmp feature!

I mean, consider for a moment that the extension itself uses a patched version of pgrx using patch.crates.io syntax. Basically, even if I were able to have my JMP_BUF_SIZE patch accepted in cee-scape when time has passed for it to be picked up by both pgrx & the target extension, none of it would build regardless! As the things currently stand, the use_c_to_interface_with_setjmp modification needs to happen in pgrx tree... So at the end of the day, I can't do this in a Docker container, CI, or anything. On subsequent updates, I would need to track local copies of three separate libraries, and patch them manually, to get a working build. Are you guys sure there isn't a better way to do this?

Perhaps there's a way to set the use_c_to_interface_with_setjmp feature automatically for all POWER builds so that it propagates automatically?

workingjubilee commented 4 days ago

@workingjubilee Thank you for explanation, it's quite illuminating. The fact that Rust refuses to recognise the reality of setjmp is weird indeed.

It would require us to support any function engaging in this behavior, which vastly complicates Rust's operational semantic model. Y'see, setjmp and longjmp are vastly more powerful than what they are actually used for (a form of unwinding, AKA delimited continuation). For us, unwinding is semantically fine, but implementing it via this returns-twice model is not. Imposing a goofy hack on people who need to use sigjmp is, honestly, better for everyone involved.

Yes, we can add target-specific configuration to Cargo.toml: https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html?highlight=cfg#platform-specific-dependencies

usamoi commented 4 days ago

@tucnak

As the things currently stand, the use_c_to_interface_with_setjmp modification needs to happen in pgrx tree...

You could add

[dependencies]
cee-scape = { version = "0.2.0", features = ["use_c_to_interface_with_setjmp"] }

# add these two lines to `$HOME/.cargo/config.toml` works too.
[patch.crates-io]
cee-scape = { path = "/usamoi/repos/cee-scape" }

to Cargo.toml of your extension being built. It works too.