Open bennmann opened 7 months ago
related question: why train only on publicly available data from the internet? if you want quality language and good knowledge, wouldn't you want to train on things like textbooks, historical documents, scientific research papers, and the like? things that you could get in a library? i'm talking like classic fundamental knowledge. training on classical philosophy would probably improve reasoning skills. and training on the OG programming textbooks would be very good for programming.
Llama 3 is not reproducible in any meaningful capacity without a list of the dataset sources.
Please release a list of the sources.