long8v / PTIR

Paper Today I Read
19 stars 0 forks source link

[60] Efficient Sparsely Activated Transformers #66

Open long8v opened 1 year ago

long8v commented 1 year ago

image

paper

TL;DR

Details

MHA레이어의 개수와 차원을 줄이고, MoE나 FFN을 추가하는 양상.