kibitzing / awesome-llm-data

A repository of information about data used in training large language models (LLMs)

0 stars 0 forks source link

readme

awesome-llm-data

A repository of information about data used in training large language models (LLMs)

Models

LLaMa 2

GPT

Safety evaluation dataset

Bias:

Bold
- Used by: Llama 2

Truthfulness:

TruthfulQA
- Used by: Llama 2

Toxicity:

ToxiGen
- Used by: Llama 2

Pre-trained Model performance evaluation dataset

Code

HumanEval
- Used by: Llama 2
MBPP
- Used by: Llama 2
  Commonsense reasoning
PIQA
- Used by: Llama 2
SIQA
- Used by: Llama 2
HellaSwag
- Used by: Llama 2
WinoGrande
- Used by: Llama 2
ARC eacy and challenge
- Used by: Llama 2
OpenBookQA
- Used by: Llama 2
CommonsenseQA
- Used by: Llama 2
  World knowledge
NaturalQuestions
- Used by: Llama 2
TriviaQA
- Used by: Llama 2
  Reading comprehension
SQuAD
- Used by: Llama 2
QuAC
- Used by: Llama 2
BoolQ
- Used by: Llama 2
  Math
GSM8K
- Used by: Llama 2
MATH
- Used by: Llama 2
  Popular aggregated benchmarks
MMLU
- Used by: Llama 2
Big Bench Hard (BBH)
- Used by: Llama 2
AGI Eval
- Used by: Llama 2

Other repositories

https://github.com/opendilab/awesome-RLHF