openlm-research / open_llama

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
Apache License 2.0
7.36k stars 374 forks source link

Costs and future #7

Closed maximegmd closed 1 year ago

maximegmd commented 1 year ago

First of all, I would like to express my gratitude for making this available on such a permissive license, it opens new doors to both researchers and industries.

1) Could you provide a breakdown of the cost of training such a model? It's important for planification to know how to budget to train large models like this one for specific use cases and I am sure many of us would be glad to have some hard data to provide our managers.

2) Do you plan on training on more tokens than the original Llama paper or is the goal only to reproduce the results?

Thanks again for your terrific work!

suhdev commented 1 year ago

Just throwing this out there, and excuse my ignorance on the subject, can we, as in the open source community, help reduce training cost by making our own machines available as computation resources for training. I'm thinking something similar to github runners but for training.

nevercast commented 1 year ago

Going to provide some answers here from the best of my knowledge, which is based purely on lurking and reading public information, I've no authority on these matters.

@suhdev I can't imagine that the overhead would ever make sense for end users contributing compute unless they've their own AI cluster already, and then the complications in infrastructure make that even harder. I could imagine financial contributions would be helpful, but if you have to ask, I can't imagine the offer would make much of a dent in the cost of this project.

@maximegmd I would imagine the compute costs are multiple millions of USD based on the amount of compute that was mentioned on Twitter.

Regarding the "training tokens" question, members of OpenLM mentioned elsewhere that they want to stay as true to Llama as possible for now.

suhdev commented 1 year ago

Thanks for the insightful answer, I do get the limitation, and perhaps I need to read a bit more on how training gets distributed on compute units of an ai cluster.

young-geng commented 1 year ago

The cost can vary a lot depending on the cloud provider. While we are not able to give you an estimate of our cost, Mosaic recently released their MPT-7B model trained on 1T tokens and their cost is around 200k USD.