Add usage for single-source example

Thank you for your interest in my crate!

The print example is currently the only one that is fully executable.

I initially developed rust-cuda for necsim-rust, a neutral ecology model. It shows the full in-action usage of this crate:

https://github.com/juntyr/necsim-rust/blob/main/rustcoalescence/algorithms/cuda/gpu-kernel/src/lib.rs shows a full kernel. The crate is single-source, i.e. used both on the GPU and host, but still only contains the kernel for code clarity (this is roughly equivalent to https://github.com/juntyr/rust-cuda/blob/2a124b6f569eccecde633def0ea2b880c3d32fd6/examples/print/src/lib.rs#L22-L32
https://github.com/juntyr/necsim-rust/blob/main/rustcoalescence/algorithms/cuda/cpu-kernel/src/link.rs contains the explicit linking step for the pseudo-generic kernels that rust-cuda supports (i.e. the kernels can be fully generic but you currently need to manually instantiate all monomorphised variants of the kernel you want to use (this is roughly equivalent to https://github.com/juntyr/rust-cuda/blob/2a124b6f569eccecde633def0ea2b880c3d32fd6/examples/print/src/main.rs#L6-L8
https://github.com/juntyr/necsim-rust/blob/c8c3023a114aeacb97e130a60eff779f9c7cb539/rustcoalescence/algorithms/cuda/src/parallelisation/monolithic.rs#L252-L268 is where I actually launch the kernel from the host and do all of the memory transfers around it (this is roughly equivalent to https://github.com/juntyr/rust-cuda/blob/2a124b6f569eccecde633def0ea2b880c3d32fd6/examples/print/src/main.rs#L40-L48

If your kernel linking is ever so slightly complex, I'd recommend to also make the three-crate split (single-source kernel, linking, host) to improve compile times (since otherwise any change in the host code recompiles all kernel variants as well).

The important things to make sure everything runs, is to have the required CUDA libraries and the "llvm-bitcode-linker" and "llvm-tools" rust components installed, and to put the https://github.com/juntyr/rust-cuda/blob/main/examples/print/.cargo/config.toml file in the crate that contains your kernel.

I hope this helps a bit :) I could also have a look at your code to help you integrate rust-cuda if you'd like.

juntyr / rust-cuda

Add usage for single-source example #21