Open yukarinoki opened 1 year ago
Ding Y, Yu C, Zheng B, Liu Y, Wang Y and Pekhimenko G. Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs. Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. (370-384). https://doi.org/10.1145/3575693.3575702
Xu Z, Xu J, Peng H, Wang W, Wang X, Wan H, Dai H, Xu Y, Cheng H, Wang K and Chen G. ALT: Breaking the Wall between Data Layout and Loop Optimizations for Deep Learning Compilation. Proceedings of the Eighteenth European Conference on Computer Systems. (199-214). https://doi.org/10.1145/3552326.3587440
Zhao H, Yang Z, Cheng Y, Tian C, Ren S, Xiao W, Yuan M, Chen L, Liu K, Zhang Y, Li Y and Lin W. (2023). GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning. Proceedings of the ACM on Management of Data. 1:2. (1-25). Online publication date: 13-Jun-2023. https://doi.org/10.1145/3589773
Wang Z, Nie P, Miao X, Chen Y, Wan C, Bu L and Zhao J. GenCoG: A DSL-Based Approach to Generating Computation Graphs for TVM Testing. Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. (904-916). https://doi.org/10.1145/3597926.3598105
Fu B, Chen F, Li P and Zeng D. TCB: Accelerating Transformer Inference Services with Request Concatenation. Proceedings of the 51st International Conference on Parallel Processing. (1-11). https://doi.org/10.1145/3545008.3545052
Jiang L, Xu P, Zhu Q, Li X, Yan S, Zhang X, Lin D, Ma W, Li Z, Liu J, Ma J, Jin M and Yang C. EasyView: Enabling and Scheduling Tensor Views in Deep Learning Compilers. Proceedings of the 51st International Conference on Parallel Processing. (1-11). https://doi.org/10.1145/3545008.3545037
Wu R, Zhang F, Guan J, Zheng Z, Du X and Shen X. DREW: Efficient Winograd CNN Inference with Deep Reuse. Proceedings of the ACM Web Conference 2022. (1807-1816). https://doi.org/10.1145/3485447.3511985
Zhang S, Cui W, Chen Q, Zhang Z, Guan Y, Leng J, Li C and Guo M. PAME. Proceedings of the 36th ACM International Conference on Supercomputing. (1-12). https://doi.org/10.1145/3524059.3532366
15 のような感じ = 機械学習最適化JITコンパイラの工夫について書いてある、#15より進んでいるのかな
高コストな融合をするか、融合をしないで大量のカーネルを出すかというジレンマがある(ジャストインタイム制約があるので)
AStichという最適化コンパイラを作った、Tensorflowから使うらしい。
Stitchは、4つのオペレータステッチングスキームを体系的に抽象化し、多次元の最適化目標を考慮しながら複雑な計算グラフの依存関係を解決し、革新的な階層的データ再利用によりさまざまなテンソル形状を効率的に処理します。 = きわめて #15 に近い