How to train the ReplitLM model

replit / ReplitLM

Inference code and configs for the ReplitLM model family

https://huggingface.co/replit

Apache License 2.0

937 stars 83 forks source link

How to train the ReplitLM model #21

Open ran337287 opened 1 year ago

ran337287 commented 1 year ago

When training, it is based on tramsformers' training course. It is started on the A100-80g machine, but the per gpu batch-size can be set to 2 at most, and there is extremely unbalanced memory occupation on multiple cards, such as 60G+ for the 0th card and 30G+ for other cards. In addition, is there a training parameter? Because of the current training strategy, the loss value is very large, and it almost drops slowly.

Symbolk commented 1 year ago

The tranining framework should be https://github.com/mosaicml/llm-foundry