yukarinoki / reseach

0 stars 0 forks source link

AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures #16

Open yukarinoki opened 1 year ago

yukarinoki commented 1 year ago

15 のような感じ = 機械学習最適化JITコンパイラの工夫について書いてある、#15より進んでいるのかな

高コストな融合をするか、融合をしないで大量のカーネルを出すかというジレンマがある(ジャストインタイム制約があるので)

AStichという最適化コンパイラを作った、Tensorflowから使うらしい。

Stitchは、4つのオペレータステッチングスキームを体系的に抽象化し、多次元の最適化目標を考慮しながら複雑な計算グラフの依存関係を解決し、革新的な階層的データ再利用によりさまざまなテンソル形状を効率的に処理します。 = きわめて #15 に近い

yukarinoki commented 1 year ago

15 にせよ、 #16 にせよ、コードが見たい

yukarinoki commented 1 year ago

被引用

Ding Y, Yu C, Zheng B, Liu Y, Wang Y and Pekhimenko G. Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs. Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. (370-384). https://doi.org/10.1145/3575693.3575702

Xu Z, Xu J, Peng H, Wang W, Wang X, Wan H, Dai H, Xu Y, Cheng H, Wang K and Chen G. ALT: Breaking the Wall between Data Layout and Loop Optimizations for Deep Learning Compilation. Proceedings of the Eighteenth European Conference on Computer Systems. (199-214). https://doi.org/10.1145/3552326.3587440

Zhao H, Yang Z, Cheng Y, Tian C, Ren S, Xiao W, Yuan M, Chen L, Liu K, Zhang Y, Li Y and Lin W. (2023). GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning. Proceedings of the ACM on Management of Data. 1:2. (1-25). Online publication date: 13-Jun-2023. https://doi.org/10.1145/3589773

Wang Z, Nie P, Miao X, Chen Y, Wan C, Bu L and Zhao J. GenCoG: A DSL-Based Approach to Generating Computation Graphs for TVM Testing. Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. (904-916). https://doi.org/10.1145/3597926.3598105

Fu B, Chen F, Li P and Zeng D. TCB: Accelerating Transformer Inference Services with Request Concatenation. Proceedings of the 51st International Conference on Parallel Processing. (1-11). https://doi.org/10.1145/3545008.3545052

Jiang L, Xu P, Zhu Q, Li X, Yan S, Zhang X, Lin D, Ma W, Li Z, Liu J, Ma J, Jin M and Yang C. EasyView: Enabling and Scheduling Tensor Views in Deep Learning Compilers. Proceedings of the 51st International Conference on Parallel Processing. (1-11). https://doi.org/10.1145/3545008.3545037

Wu R, Zhang F, Guan J, Zheng Z, Du X and Shen X. DREW: Efficient Winograd CNN Inference with Deep Reuse. Proceedings of the ACM Web Conference 2022. (1807-1816). https://doi.org/10.1145/3485447.3511985

Zhang S, Cui W, Chen Q, Zhang Z, Guan Y, Leng J, Li C and Guo M. PAME. Proceedings of the 36th ACM International Conference on Supercomputing. (1-12). https://doi.org/10.1145/3524059.3532366