Open JordanLloydHall opened 4 days ago
Thank you for your interest in my crate!
The print
example is currently the only one that is fully executable.
I initially developed rust-cuda
for necsim-rust
, a neutral ecology model. It shows the full in-action usage of this crate:
https://github.com/juntyr/necsim-rust/blob/main/rustcoalescence/algorithms/cuda/gpu-kernel/src/lib.rs shows a full kernel. The crate is single-source, i.e. used both on the GPU and host, but still only contains the kernel for code clarity (this is roughly equivalent to https://github.com/juntyr/rust-cuda/blob/2a124b6f569eccecde633def0ea2b880c3d32fd6/examples/print/src/lib.rs#L22-L32
https://github.com/juntyr/necsim-rust/blob/main/rustcoalescence/algorithms/cuda/cpu-kernel/src/link.rs contains the explicit linking step for the pseudo-generic kernels that rust-cuda
supports (i.e. the kernels can be fully generic but you currently need to manually instantiate all monomorphised variants of the kernel you want to use (this is roughly equivalent to https://github.com/juntyr/rust-cuda/blob/2a124b6f569eccecde633def0ea2b880c3d32fd6/examples/print/src/main.rs#L6-L8
https://github.com/juntyr/necsim-rust/blob/c8c3023a114aeacb97e130a60eff779f9c7cb539/rustcoalescence/algorithms/cuda/src/parallelisation/monolithic.rs#L252-L268 is where I actually launch the kernel from the host and do all of the memory transfers around it (this is roughly equivalent to https://github.com/juntyr/rust-cuda/blob/2a124b6f569eccecde633def0ea2b880c3d32fd6/examples/print/src/main.rs#L40-L48
If your kernel linking is ever so slightly complex, I'd recommend to also make the three-crate split (single-source kernel, linking, host) to improve compile times (since otherwise any change in the host code recompiles all kernel variants as well).
The important things to make sure everything runs, is to have the required CUDA libraries and the "llvm-bitcode-linker" and "llvm-tools" rust components installed, and to put the https://github.com/juntyr/rust-cuda/blob/main/examples/print/.cargo/config.toml file in the crate that contains your kernel.
I hope this helps a bit :) I could also have a look at your code to help you integrate rust-cuda
if you'd like.
I really love the work you're doing here, and want to use it in my own personal project! However I'm currently getting to grips with the usage of this crate, and I'm struggling to get single-source to do anything once it's built. Would it be possible to have more examples to work off of, or have this example come with a bin target?
Many thanks for your work! And I would be happy to take a look at creating more examples once I've wrapped my head around it!