issues
search
timlee0212
/
SiDA-MoE
Code for MLSys 2024 Paper "SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models"
MIT License
8
stars
4
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
In the NLG context, where the decoding phase is neccessary, will you predict and load predicted experts in every iteration? I am curious about this since you merely mentioned the task of classification.
#2
victayria77
opened
3 weeks ago
1
Failed to load dataset for finetune.py
#1
Tonanguyxiro
closed
2 months ago
2