joey00072 / ohara

Collection of autoregressive model implementation
67 stars 6 forks source link

Re: BitNet tests #6

Closed JorgeCepeda closed 7 months ago

JorgeCepeda commented 7 months ago

The paper mentions that it's for big models, like billions of parameters, not 15 million parameters

joey00072 commented 7 months ago

I not have capacity to scale it and test it at billion scale. code is open-source feel free to try if you can.

GPU do be expensive.