DNN kernels to support GPT decoder models and additional utilities

viv-eth commented 10 months ago

The SW libraries have been restructured into separate folders to ease data generation and verification (Occamy only).

This PR implements the following kernels:

Multi-cluster GEMM with tiling + verification
Matrix concatenation + verification
FlashAttention-2
Fused linear and concatenation layer with logarithmic reduction + verification
i-GeLu activation function + verification
LayerNorm

The following utilities have been added:

Safe float/integer casts to ensure consistency
Convenient 2D DMA transfer functions
Global Reduction function for binary tree reduction across multiple clusters

⚠️Currently dummy functions are used for the exponential due to a HW bug in the FPU and integer core synchronization. This will be reverted as soon as the feature is implemented.

colluca commented 10 months ago

If we cherry-pick 18554fc57d8f4ecd424b222c4fcb94a784a6cbca and 420dfb3cc2f2478599d89f8c590a686e9564fc0f we can also add LayerNorm to the CI, while we wait for the release of the -mno-fdiv supporting toolchain.

fischeti commented 10 months ago

Is this now a draft or not?

colluca commented 10 months ago

⚠️Currently dummy functions are used for the exponential due to a HW bug in the FPU and integer core synchronization. This will be reverted as soon as the feature is implemented.

What do you mean "dummy" functions? I did implement correct versions in the math library, did you cherry-pick those commits?

pulp-platform / snitch_cluster

DNN kernels to support GPT decoder models and additional utilities #87