issues
search
zhangjun
/
zhangjun.github.io
https://zhangjun.github.io
2
stars
0
forks
source link
Papers
#30
Open
zhangjun
opened
1 year ago
zhangjun
commented
1 year ago
accelerator
Modeling Deep Learning Accelerator Enabled GPUs
zhangjun
commented
9 months ago
Papaer Lists
Mlsys
[Bolt]()
论文阅读 [精读]-BOLT: BRIDGING THE GAP BETWEEN AUTO-TUNERS AND HARDWARE-NATIVE PERFORMANCE
BOLT:弥合自动调优和硬件原生性能之间的差距
zhangjun
commented
9 months ago
量化
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
https://blog.csdn.net/qq_19784349/article/details/82883271
$r=S(q-Z)$ => $q=round(\frac{r}{S}+Z)$; S - scale, Z - zero-point $\Large{S=\frac{val
{max}-val
{min}}{2
{bit_length}-1}}$ $\Large{Z=round(-\frac{val
{min}}{S})}$
zhangjun
commented
9 months ago
训练
PS
[Accelerating Collective Communication in Data Parallel Training across Deep Learning Frameworks](
https://www.usenix.org/system/files/nsd()i22spring_prepub_romero.pdf
)
parallel training
PipeDream: Generalized Pipeline Parallelism for DNN Training
https://insujang.github.io/2022-06-11/parallelism-in-distributed-deep-learning/
communication
Compressed Communication for Distributed Deep Learning: Survey and Quantitative Evaluation
Efficient Sparse Collective Communication and its application to Accelerate Distributed Deep Learning
accelerator
Modeling Deep Learning Accelerator Enabled GPUs