Open xieby1 opened 3 months ago
https://news.ycombinator.com/item?id=40970560
ckitching 1 day ago | root | parent | next [–]
Hi! Spectral engineer here!
SCALE does not use any part of ZLUDA. We have modified the clang frontend to convert inline PTX asm block to LLVM IR.
To put in a less compiler-engineer-ey way: for any given block of PTX, there exists a hypothetical sequence of C++/CUDA code you could have written to achieve the same effect, but on AMD (perhaps using funky _builtin... functions if the code includes shuffles/ballots/other-weird-gpu-stuff). Our compiler effectively converts the PTX into that hypothetical C++.
Regarding memory consistency etc.: NVIDIA document the "CUDA memory consistency model" extremely thoroughly, and likewise, the consistency guarantees for PTX. It is therefore sufficient to ensure that we use operations at least as synchronising as those called for in the documented semantics of the language (be it CUDA or PTX, for each operation).
Differing consistency between architectures is the AMDGPU backend's problem.
https://docs.scale-lang.com/examples/ptx/
TODO: How does ptx translation work?