Maturity of PETSc & libCEED Rust bindings

eliasboegel commented 4 months ago

Hi Jeremy,

I wasn't sure where to put this question since I did not want to clog up the issues on petsc-rs and libCEED and decided to place it here - I hope you don't mind.

I'm looking to create a small research code that solves a large hyperbolic system with DGSEM, and I would like to do it matrix-free and on-device as much as possible. I quite like the idea of using libCEED and either PETSc (PLEX + FE) or MFEM. I have a special requirement here in that different elements may have a different number of equations to solve, which is then handled by the numerical flux. I saw that some Rust bindings are available for both libCEED and PETSc, and I got curious since I greatly favour Rust tooling over C(++) tooling. From your answers in https://github.com/CEED/libCEED/issues/1621 I got the impression that purely from the side of the libCEED bindings, there would not be issues in getting kernels to run on accelerators provided I write them in C. What is more opaque to me is whether the combination of PETSc and libCEED bindings would be mature enough for a research code, and if not, what effort would be involved in contributing to both bindings to make it happen. I find it especially hard to condense what exactly is possible to do at the moment, and what isn't.

Would you have any insight or opinions on this?

jeremylt commented 4 months ago

Looping @jedbrown in here as we've been talking in person.

I think building an example of using libCEED + PETSc Rust wrappers together, perhaps replicating the CEED BPs examples (mass and Poisson problems) would really show what is missing in those wrappers being ready to write research code. I think there's also some decisions to be made about some shifts to the PETSc wapper to better match idiomatic Rust and avoid the rather complicated C ownership patterns.

eliasboegel commented 4 months ago

When it comes to making contributions to the PETSc bindings, how much effort would you estimate needs to go into supporting features used in a "typical serious implementation" (recent PETSc version, accelerators, complete DM/KSP/SNES support, TS,...), based on the current state? While I would call myself a Rust beginner, I would be interested in contributing if I go this route. The amount of work required on the bindings will determine feasibility for me.

jedbrown commented 4 months ago

You may have encountered https://gitlab.com/petsc/petsc-rs, which has some examples. I'd like to make a release soon on crates.io, but there have been some questions about how much to try to exert Rust semantics (strong aliasing guarantees, lifetimes) versus how much to mirror PETSc semantics (weak/no aliasing prevention, reference counting) and I'd like to get that settled. Increasing interface coverage is relatively simple in most cases.

eliasboegel commented 4 months ago

You may have encountered https://gitlab.com/petsc/petsc-rs, which has some examples. I'd like to make a release soon on crates.io, but there have been some questions about how much to try to exert Rust semantics (strong aliasing guarantees, lifetimes) versus how much to mirror PETSc semantics (weak/no aliasing prevention, reference counting) and I'd like to get that settled. Increasing interface coverage is relatively simple in most cases.

Thanks Jed & Jeremy, I've of course seen the petsc-rs repository and the examples. What I'm most looking to find though are examples where petsc-rs and libCEED bindings are used together, and as far as I know, this doesn't exist except in meles-rs, and here it is kept very simple (just BPs on CPU). Really my interest is: What would an application code using the bindings of both PETSc and libCEED look like?

On the Rust semantics of petsc-rs: I assume this is not only a question of what users prefer but also an implementation problem to conform to Rust semantics more strongly? Is this something you would like user opinions on? Not taking into account the larger effort to do it, I personally would find an interface as "Rust idiomatic" as possible preferrable. I could imagine it also helps with gaining more users. I also recall finding an idea by Jeremy on changing the interface to better reflect Rust expectations. The example was to make DM a trait, and each DM an implementation such that e. g. plex functions can only be called on a plex. I quite like that style of interface a lot.

On your note of increasing interface coverage being simple in most cases, does that include accelerators - say running KSP/SNES on device and keeping DM data on device, or are there bigger roadblocks for this? This is probably my biggest concern on the PETSc binding side at the moment.

jedbrown commented 4 months ago

Perhaps the root issue for Rusty vs PETSc/C semantics is that

Callbacks are pervasive in this problem domain, so we need to operate on objects that have been reconstructed in the callback context.
PETSc/C uses intrusive reference counting so we can't have something like an Rc<Mat>. Not doing this would cause different problems.
PETSc/C does not distinguish at a type level between mutable and immutable or even owned/borrowed objects (though Get/Restore is a sort of structured borrow). The ergonomics to do so in C would be pretty painful.
At an abstract level, it's unclear what ownership and mutability should look like for a nested solver. (E.g., Mat can be used by multiple nested solvers, a PC can be used by multiple KSP inside and outside of SNES.) Note that even methods like solve() mutate the receiver (which typically means &mut self). If you're working in pure Rust and not trying to devise rules to outsmart this web, you would either use interior mutability in the sense of Rc<RefCell<Mat>> or an ECS. I don't know a way to pair an ECS with FFI/PETSc and the ergonomics of Rc<RefCell<Mat>> are poor.

I think 4 is an interesting topic more generally than how to expose PETSc. The problem of type-level encoding of relations for preconditioned iterative solvers is deceptively hard: https://hachyderm.io/@mhoemmen@c.im/111648436529317638

Cc: @tisaac

eliasboegel commented 4 months ago

Thanks for this, indeed more difficult than I expected. @mhoemmen's comment on the similarity to GUI structure problems is interesting. I wonder if something can be learned from the relatively mature bindings to GTK4 (C), or similar GUI Rust bindings that deal with complex hierarchies and FFI. I'm not really familiar with any of them, but they must have similar problems to solve and GUI toolkit folks seem care about interface ergonomics a lot. Is there a channel through which any of these questions/points are (actively) being discussed?

Going back to the original question of the issue, are there currently any major downsides to using PETSc and libCEED together from Rust (compared to using both directly without bindings from a C application code) that can't be made functional with smaller contributions to both bindings? So far, I gathered:

libCEED QFs must be written in C to run on device.
PETSc bindings are not ready to run on device. I'm unsure about amount of work needed to make the bindings work. Do any others come to mind?

jedbrown commented 4 months ago

This is something I anticipate moving along in the fall, doing kernel fusion at the LLVM level instead of at source level.
The matrix and vector operations (thus preconditioner iterative solvers, etc.) can run on device today (with no code modifications, only run-time options). If you want your own GPU kernels (not managed by libCEED) to operate on PETSc data, you would need to choose a Rust GPU crate. It should not be hard to give access to a device vector inside a cudarc CudaView (at least if we can get cudarc to expose a from_raw_parts), but even that isn't an environment in which you write kernels in Rust and HIP/SYCL (for AMD and Intel GPUs respectively) is even less supported.

eliasboegel commented 4 months ago

Thanks - that's great. I was under the impression that no part of PETSc can currently run on device through petsc-rs since accelerators are still crossed out in the readme. At the moment I only need custom element-wise vector operations and my understanding is that I can actually formulate this as a QFunction, and do it on device that way.

Awesome to hear about the LLVM IR kernel fusion too. Together with libCEEDs WASM target, it could also make a WebGPU backend possible since LLVM has a SPIR-V target and the WebGPU implementations in Chromium and Firefox currently can ingest SPIR-V.

Thanks for the help!

jedbrown commented 4 months ago

Ah, yeah, I see more things to revise in that readme.

For WGPU, see Embark's rust-gpu project. It's more focused on graphics (as seems to be the case across the WGPU ecosystem), but would be interesting to explore for numerical computing.

eliasboegel commented 4 months ago

rust-gpu is very interesting, but I don't think it would be suitable for a new libCEED backend since its both quite inactive and applies only to Rust. Unfortunately the WebGPU spec also doesn't specify SPIR-V, only WGSL. Currently the tooling for WGSL is lacking a lot. SPIR-V is only interesting since the two main implementations support it out of spec.

jedbrown commented 4 months ago

Thanks, I'm less familiar with that space, but it would be interesting (especially for teaching) if we could target WASM and WGPU to run simulations on the client at acceptable speed.

mhoemmen commented 4 months ago

@jedbrown awww you quoted me < 3 < 3 < 3

jeremylt / meles-rs

Maturity of PETSc & libCEED Rust bindings #1