philipturner / metal-benchmarks

Apple GPU microarchitecture
MIT License
454 stars 16 forks source link

[Question]:What are these BW meaning for? #3

Open FdyCN opened 9 months ago

FdyCN commented 9 months ago

I really appreciate what job you have done, that's awesome!but I am confused about some datas meaning in the table below: image

  1. Shared BW/Cycle means: shared memory <---> register pass bandwidth?
  2. What's the difference between on-Core data and on-GPU data?what are they standing for?
  3. What's "SLC" meaning?

thank you again.

philipturner commented 9 months ago

Shared BW/cycle is aggregate bandwidth from threadgroup memory. It's the number of bytes that can be shuffled around per core-cycle. On-core is a vendor-agnostic word for "L1 cache", on-GPU is a vendor-agnostic word for "L2 cache". SLC is the system-level cache or "L3 cache". On vendors like AMD, it is the Infinity Cache.

Some of these distinctions may have gone away with Apple 9, which entirely redesigned the memory subsystem. It will be interesting to update these benchmarks for the A17 Pro next summer.

FdyCN commented 9 months ago

Shared BW/cycle is aggregate bandwidth from threadgroup memory. It's the number of bytes that can be shuffled around per core-cycle. On-core is a vendor-agnostic word for "L1 cache", on-GPU is a vendor-agnostic word for "L2 cache". SLC is the system-level cache or "L3 cache". On vendors like AMD, it is the Infinity Cache.

Some of these distinctions may have gone away with Apple 9, which entirely redesigned the memory subsystem. It will be interesting to update these benchmarks for the A17 Pro next summer.

thank you for the reply. So RAM BW/Cycle is the global memory bandwidth?

philipturner commented 9 months ago

Correct. The Apple GPU should have relatively high BW/cycle compared to other vendors, making it disproportionately good at bandwidth-bound use cases like LLaMA.cpp.