Closed yxgcsq closed 1 month ago
Your dataset should be stored under benchmark_root/gpt-4-turbo/synthetic/131072/data/niah_multikey_1/validation.jsonl
. If you don't see it, maybe you can see some errors when generating dataset. Or you can directly run the following command to check.
python data/prepare.py \
--save_dir benchmark_root/gpt-4-turbo/synthetic/131072/data/ \
--benchmark synthetic \
--task niah_single_1 \
--tokenizer_path cl100k_base \
--tokenizer_type openai \
--max_seq_length 131072 \
--model_template_type base \
--num_samples 500
Prepare niah_multikey_1 with lines: 500 to benchmark_root/gpt-4-turbo/synthetic/131072/data/niah_multikey_1/validation.jsonl Used time: 0.0 minutes Predict niah_multikey_1 from benchmark_root/gpt-4-turbo/synthetic/131072/data/niah_multikey_1/validation.jsonl to benchmark_root/gpt-4-turbo/synthetic/131072/pred/niah_multikey_1.jsonl
validation.jsonl Where did I get this JSON file from