Closed JorgeCepeda closed 7 months ago
The paper mentions that it's for big models, like billions of parameters, not 15 million parameters
I not have capacity to scale it and test it at billion scale. code is open-source feel free to try if you can.
GPU do be expensive.
The paper mentions that it's for big models, like billions of parameters, not 15 million parameters