llm-jp / experiments

Issue-Only Pretrain Task Management Repository
0 stars 0 forks source link

[事前学習] - MoE Baseline1 #59

Closed Taishi-N324 closed 3 weeks ago

Taishi-N324 commented 1 month ago

Overview

8x1.8Bを2.1Tでスクラッチ学習を行う

Details

モデルカードPR: https://github.com/llm-jp/model-cards/pull/25

LLM-JP-MoE-2024年度後期実験計画 のBaseline1

Resources

Taishi-N324 commented 3 weeks ago

速度の観点から、 https://github.com/llm-jp/experiments/issues/71 に移行します