microsoft / torchscale

Foundation Architecture for (M)LLMs
https://aka.ms/GeneralAI
MIT License
3k stars 201 forks source link

Retnet training is slow #55

Closed Zth9730 closed 1 year ago

Zth9730 commented 1 year ago

Hi, when I use retnet's parallel mode to train, it's very slow, I observe the gou memory usage, it's very small, what's going on? Thank you!

### Tasks
sunyt32 commented 1 year ago

Can you show your script for reproduction? It sounds weird because of the low memory usage. You can compare the RetNet training with the transformer module in TorchScale.

Zth9730 commented 1 year ago

Thanks a lot, I checked my config and find my batchsize is small, sry!