long8v / PTIR

Paper Today I Read
19 stars 0 forks source link

[54] Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models #60

Open long8v opened 1 year ago

long8v commented 1 year ago

image

paper

TL;DR

Details

Batch-Train-Merge(BTM)

image image

Inference

image

모든 ELM에 forward 해야하는건 맞지만 선택되는 ELM이 sparse하게 구성됨을 확인할수 있엏음.

Data..

image

DeMix

DeMix, 2021