pulp-platform / snitch_cluster

An energy-efficient RISC-V floating-point compute cluster.
https://pulp-platform.github.io/snitch_cluster/
Apache License 2.0
52 stars 55 forks source link

DNN kernels to support GPT decoder models and additional utilities #87

Closed viv-eth closed 9 months ago

viv-eth commented 10 months ago

The SW libraries have been restructured into separate folders to ease data generation and verification (Occamy only).

This PR implements the following kernels:

  1. Multi-cluster GEMM with tiling + verification
  2. Matrix concatenation + verification
  3. FlashAttention-2
  4. Fused linear and concatenation layer with logarithmic reduction + verification
  5. i-GeLu activation function + verification
  6. LayerNorm

The following utilities have been added:

  1. Safe float/integer casts to ensure consistency
  2. Convenient 2D DMA transfer functions
  3. Global Reduction function for binary tree reduction across multiple clusters

⚠️Currently dummy functions are used for the exponential due to a HW bug in the FPU and integer core synchronization. This will be reverted as soon as the feature is implemented.

colluca commented 10 months ago

If we cherry-pick 18554fc57d8f4ecd424b222c4fcb94a784a6cbca and 420dfb3cc2f2478599d89f8c590a686e9564fc0f we can also add LayerNorm to the CI, while we wait for the release of the -mno-fdiv supporting toolchain.

fischeti commented 10 months ago

Is this now a draft or not?

colluca commented 10 months ago

⚠️Currently dummy functions are used for the exponential due to a HW bug in the FPU and integer core synchronization. This will be reverted as soon as the feature is implemented.

What do you mean "dummy" functions? I did implement correct versions in the math library, did you cherry-pick those commits?