issues
search
kibitzing
/
awesome-llm-data
A repository of information about data used in training large language models (LLMs)
0
stars
0
forks
source link
readme
awesome-llm-data
A repository of information about data used in training large language models (LLMs)
Models
LLaMa 2
Pre-training Data used in LLaMa 2
Fine-tuning Data used in LLaMa 2
GPT
Safety evaluation dataset
Bias
:
Bold
Used by: Llama 2
Truthfulness
:
TruthfulQA
Used by: Llama 2
Toxicity
:
ToxiGen
Used by: Llama 2
Pre-trained Model performance evaluation dataset
Code
HumanEval
Used by: Llama 2
MBPP
Used by: Llama 2
Commonsense reasoning
PIQA
Used by: Llama 2
SIQA
Used by: Llama 2
HellaSwag
Used by: Llama 2
WinoGrande
Used by: Llama 2
ARC eacy and challenge
Used by: Llama 2
OpenBookQA
Used by: Llama 2
CommonsenseQA
Used by: Llama 2
World knowledge
NaturalQuestions
Used by: Llama 2
TriviaQA
Used by: Llama 2
Reading comprehension
SQuAD
Used by: Llama 2
QuAC
Used by: Llama 2
BoolQ
Used by: Llama 2
Math
GSM8K
Used by: Llama 2
MATH
Used by: Llama 2
Popular aggregated benchmarks
MMLU
Used by: Llama 2
Big Bench Hard (BBH
)
Used by: Llama 2
AGI Eval
Used by: Llama 2
Other repositories
https://github.com/opendilab/awesome-RLHF