jasperzhong / cs-notes

CS认知体系
6 stars 0 forks source link

learn CUDA #10

Open jasperzhong opened 3 years ago

jasperzhong commented 3 years ago

https://github.com/NVIDIA/cuda-samples

简直是宝藏.

jasperzhong commented 2 years ago

https://www.youtube.com/watch?v=4APkMJdiudU&list=PLC6u37oFvF40BAm7gwVP7uDdzmW83yHPe

CPU和GPU区别总结的挺精辟的.

如何利用massive number of GPU cores.

概念:

  1. Host vs Device => heterogenous => where CUDA comes in (C with extensions)
  2. Host与Device通过PICe通信 (但PCIe通常很慢).
  3. CUDA threads以SIMD方式执行. NVIDIA把这个叫做SIMT (其实是一回事)...

线程的组织:

  1. thread -> one GPU core
  2. block
  3. grid -> entire GPU

grid/block有dimension,可以是1D/2D/3D.


https://www.youtube.com/watch?v=OSpy-HoR0ac&list=PLC6u37oFvF40BAm7gwVP7uDdzmW83yHPe&index=5

Memory model

这节讲的很好...excellent!

image

注意local memory是很慢的..比shared memory慢很多. 这是因为local memory是off-chip的. image

图中绿色的部分是NVIDIA cores所在的地方,上面的memory是on-chip的. 蓝色部分是DRAM,是off-chip. image


https://www.youtube.com/watch?v=PJCISyoGpug&list=PLC6u37oFvF40BAm7gwVP7uDdzmW83yHPe&index=6

讲了synchronization primitives.