Closed EthanChen1234 closed 1 year ago
We haven't comprehensively compared the two models. We are planning to do more comparisons soon.
Great work guys! I appreciate how complex and difficult an undertaking this is when you don't have the unlimited resources of a major tech company.
I also appreciate how frustrating it is when people log problems are obviously out of the scope of the project, or so poorly defined that it is more likely their process rather than the Open Llama model that is root cause of their issue.
@gjmulder It's a great job, and I'm appreciate the efforts to train the LLM.
As you mentioned, the limited resources, which make us very careful to conduct the experiments. Compared the dataset category, the v2 is almost the same as v1.
If it is convenient for you, can you explain experimental purpose the v2?
@EthanChen1234 The dataset of v2 is quite different, as it includes the entire StarCoder dataset, which makes 30% of the whole composition code. OpenLLaMA is not so much a research project, but an effort to make a good permissively licensed open source replacement of LLaMA. In this sense we are not planning to investigate a particular research question or write a paper with this project.
@young-geng @gjmulder thanks.
Dataset
The v1 models are trained on the RedPajama dataset. The v2 models are trained on a mixture of the Falcon refined-web dataset, the StarCoder dataset and the wikipedia, arxiv, book and stackexchange part of the RedPajama dataset. We follow the exactly same preprocessing steps and training hyperparameters as the original LLaMA paper, including model architecture, context length, training steps, learning rate schedule, and optimizer. The only difference between our setting and the original one is the dataset used: OpenLLaMA employs open datasets rather than the one utilized by the original LLaMA.
Evaluation
OpenLLaMA 7Bv2 average score is 0.56,while OpenLLaMA 7B average score is 0.55.
Doubts
the two model's performance is similar, do you have the deep analysis?