Open FdyCN opened 9 months ago
Shared BW/cycle is aggregate bandwidth from threadgroup memory. It's the number of bytes that can be shuffled around per core-cycle. On-core is a vendor-agnostic word for "L1 cache", on-GPU is a vendor-agnostic word for "L2 cache". SLC is the system-level cache or "L3 cache". On vendors like AMD, it is the Infinity Cache.
Some of these distinctions may have gone away with Apple 9, which entirely redesigned the memory subsystem. It will be interesting to update these benchmarks for the A17 Pro next summer.
Shared BW/cycle is aggregate bandwidth from threadgroup memory. It's the number of bytes that can be shuffled around per core-cycle. On-core is a vendor-agnostic word for "L1 cache", on-GPU is a vendor-agnostic word for "L2 cache". SLC is the system-level cache or "L3 cache". On vendors like AMD, it is the Infinity Cache.
Some of these distinctions may have gone away with Apple 9, which entirely redesigned the memory subsystem. It will be interesting to update these benchmarks for the A17 Pro next summer.
thank you for the reply. So RAM BW/Cycle is the global memory bandwidth?
Correct. The Apple GPU should have relatively high BW/cycle compared to other vendors, making it disproportionately good at bandwidth-bound use cases like LLaMA.cpp.
I really appreciate what job you have done, that's awesome!but I am confused about some datas meaning in the table below:
thank you again.