junxnone / xwiki

https://junxnone.github.io/xwiki/
0 stars 0 forks source link

Hardware GPU iGPU Arch #225

Open junxnone opened 1 year ago

junxnone commented 1 year ago

Intel GPUs Arch

架构图

image

DG1 Gen12.1 GPU

image

Arc GPU Gen 12.7

image

Intel GPUs Gen

Gen Code Name Name
Gen 9 Sky Lake Kaby Lake Coffee Lake Intel® UHD Graphics Intel® Iris® Graphics Intel® Iris® Plus Graphics Intel® Iris® Pro Graphics
Gen 11 Ice Lake Intel® Iris® Plus Graphics
Gen 12.1 Rocket Lake Tiger Lake Intel® Iris® Xe Graphics
Gen 12.1 DG1 Intel® Iris® Xe Max Graphics
Gen 12.1 Alder Lake Intel® Iris® Xe Graphics
Gen 12.5 Arctic Sound (ATS) -
Gen 12.7 Alchemist or ACM (previously DG2)ATS-M Intel® Xe-HPG Graphics
Gen 12.7 Ponte Vecchio (PVC) Intel® Xe-HPC Graphics

架构层次

Old Term New Intel Term Generic Term New abbreviation
Execution Unit (EU) Xe Vector Engine Vector Engine XVE
Sub-slice (SS) or Dual Sub-slice Xe-core N/A XC
Slice For Xe-HPG: - Render Slice Slice SLC
-- For Xe-HPC: - Compute Slice Slice SLC
Tile Stack Stack STK

Slice

image

SubSlice

image

EU

image

ALU

不同的 GPUs 参数

Generations Threads per EU EUs per SubSlice SubSlices Total Threads Total Operations
Gen9 (BDW) 7 8 3 168 1344
Intel Iris Xe (Gen11) 7 8 8 448 3584
Intel Iris Xe (Gen12) 7 16 6 672 5376

Memory

image

SLM

SLM 特性

[[NDRange]] Mapping to iGPU

image image
Summary EUs Threads Operations Maximum Work Group Size Maximum Work Groups
Each SubSlice 16  7x16=112 112x8=896   512 16
Total 16x6=96   112x6=672 896x6=5376  512  16x6=96

Intel® Iris® Xe Graphics (TGL) GPU

There are 16 barrier registers per sub-slice, so no more than 16 work-groups can be executed simultaneously. The amount of shared local memory available per sub-slice (64KB). If for example a work-group requires 32KB of shared local memory, only 2 of those work-groups can run concurrently, regardless of work-group size.

Reference