Build an opensource GPU, targeting ASIC tape-out, for machine learning ("ML"). Hopefully, can get it to work with the PyTorch deep learning framework.
Create an opensource GPU for machine learning.
I don't actually intend to tape this out myself, but I intend to do what I can to verify somehow that tape-out would work ok, timings ok, etc.
Intend to implement a HIP API, that is compatible with pytorch machine learning framework. Open to provision of other APIs, such as SYCL or NVIDIA® CUDA™.
Internal GPU Core ISA loosely compliant with RISC-V ISA. Where RISC-V conflicts with designing for a GPU setting, we break with RISC-V.
Intend to keep the cores very focused on ML. For example, brain floating point ("BF16") throughout, to keep core die area low. This should keep the per-core cost low. Similarly, Intend to implement only few float operations critical to ML, such as exp
, log
, tanh
, sqrt
.
Big Picture:
GPU Die Architecture:
Single Core:
Single-source compilation and runtime
Single-source C++:
Compile the GPU and runtime:
Compile the single-source C++, and run:
What direction are we thinking of going in? What works already? See:
Our assembly language implementation and progress. Design of GPU memory, registers, and so on. See:
If we want to tape-out, we need solid verification. Read more at:
we want the GPU to run quickly, and to use minimal die area. Read how we measure timings and area at: