triton-lang / triton

Development repository for the Triton language and compiler
https://triton-lang.org/
MIT License
11.93k stars 1.41k forks source link

[Question] Swizzling the shared memory #2675

Open Paran0idy opened 7 months ago

Paran0idy commented 7 months ago

Hi guys.

In SharedEncodingAttr, I see some arguments, vec, perphase and maxphase. I don't understand how these args implement shared memory swizzling. What is the principle?

An encoding for tensors whose elements may be simultaneously accessed by
different cuda threads in the programs, via shared memory. In other words,
for all indices i \in R^d, \mathcal{L}(i) = {0, 1, ..., 32*num_warps - 1}.

In order to avoid shared memory bank conflicts, elements may be swizzled
in memory. For example, a swizzled row-major layout could store its data
as follows:

A_{0, 0}  A_{0, 1}  A_{0, 2}  A_{0, 3} ...   [phase 0] \ per_phase = 2
A_{1, 0}  A_{1, 1}  A_{1, 2}  A_{1, 3} ...   [phase 0] /
groups of vec=2 elements
are stored contiguously
_ _ _ _ /\_ _ _ _
A_{2, 2}  A_{2, 3}  A_{2, 0}  A_{2, 1} ...   [phase 1] \ per phase = 2
A_{3, 2}  A_{3, 3}  A_{3, 0}  A_{3, 1} ...   [phase 1] /

Can you give me some introduction or relevant implementation? Thanks!

jon-chuang commented 7 months ago

Hmm this seems useful: https://www.jokeren.tech/slides/Triton_bsc.pdf

jon-chuang commented 7 months ago

From https://www.jokeren.tech/slides/triton_next.pdf

image

@Jokeren has a lot of great material, maybe he might be willing to link it into the repo? :)

jon-chuang commented 7 months ago

But still don't really understand what is going on, more explanation would be appreciated.

See also: https://github.com/openai/triton/issues/2102 for previous question.

zhanglx13 commented 7 months ago

https://github.com/openai/triton/discussions/2026

Paran0idy commented 7 months ago

2026

Thanks, the example is clear, it's useful for me.

Paran0idy commented 7 months ago

From https://www.jokeren.tech/slides/triton_next.pdf

image

@Jokeren has a lot of great material, maybe he might be willing to link it into the repo? :)

Thanks, materials are useful.