Closed Zth9730 closed 1 year ago
Can you show your script for reproduction? It sounds weird because of the low memory usage. You can compare the RetNet training with the transformer module in TorchScale.
Thanks a lot, I checked my config and find my batchsize is small, sry!
Hi, when I use retnet's parallel mode to train, it's very slow, I observe the gou memory usage, it's very small, what's going on? Thank you!