Open kibitzing opened 5 months ago
LLaMa 2
up-sampling the most factual sources in an effort to increase knowledge and dampen hallucinations.
includes a new mix of data from publicly available sources, which does not include data from Meta’s products or services
remove data from certain sites known to contain a high volume of personal information about private individuals
It is important to understand what is in the pretraining data both to increase transparency and to shed light on root causes of potential downstream issues, such as potential biases.
Pre-training data summary:
What kind of data were used for training LLaMa 2?