Implement accelerate support for semantic/coarse/fine transformers

lucidrains / audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch

MIT License

2.33k stars 249 forks source link

Implement accelerate support for semantic/coarse/fine transformers #207

Closed LWprogramming closed 1 year ago

LWprogramming commented 1 year ago

Based on https://arxiv.org/abs/1706.02677

scale learn rate by num gpus
make accelerate wait before sampling, just like soundstream does it

lucidrains commented 1 year ago

@LWprogramming :pray: , we should def get the accelerate waits in!

however i think we should not do the scaling learning rate by number of GPUs, and just leave that up to the researcher to pass in. that paper is a bit dated and i don't believe the relationship is a linear one. just as an example to think about, if it took 25k GPUs to train GPT-4, what should their learning rate be?

lucidrains commented 1 year ago

@LWprogramming decided to get the wait invocations in; thanks for the PR again!