sangmichaelxie / doremi

Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets
https://arxiv.org/abs/2305.10429
MIT License
286 stars 32 forks source link

Multi-nodes support #6

Open binxuan opened 1 year ago

binxuan commented 1 year ago

Hi,

Thanks for sharing this opensource implementation. I am wondering does the current implementation support to train a larger reference/proxy model using multi-nodes?

Thanks

sangmichaelxie commented 1 year ago

No, we're only supporting single node training at the moment. I'll let you know if we do this in the future.