datasets where - Githubissues

hsiehjackson / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?

Apache License 2.0

544 stars 33 forks source link

Your dataset should be stored under benchmark_root/gpt-4-turbo/synthetic/131072/data/niah_multikey_1/validation.jsonl. If you don't see it, maybe you can see some errors when generating dataset. Or you can directly run the following command to check.

        python data/prepare.py \
            --save_dir benchmark_root/gpt-4-turbo/synthetic/131072/data/ \
            --benchmark synthetic \
            --task niah_single_1 \
            --tokenizer_path cl100k_base \
            --tokenizer_type openai \
            --max_seq_length 131072 \
            --model_template_type base \
            --num_samples 500

hsiehjackson / RULER

datasets where #58