hsiehjackson / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
Apache License 2.0
321 stars 17 forks source link

dataset argument for qa.py not specified #18

Closed vkaul11 closed 1 month ago

vkaul11 commented 1 month ago

In the sample command you specify for qa.py, you don't specify the dataset argument https://github.com/hsiehjackson/RULER/blob/main/scripts/data/synthetic/qa.py#L58 and I am getting this error. Can you let me know what dataset should be ? I suppose you pass those somewhere when you run things end to end?

(long-context) vivekkaul@Viveks-MacBook-Pro synthetic % python qa.py \
    --save_dir=./ \
    --save_name=qa \
    --tokenizer_path=tokenizer.model \
    --tokenizer_type=hf \
    --max_seq_length=4096 \
    --tokens_to_generate=128 \
    --num_samples=10 \
    --template="Answer the question based on the given documents. Only give me the answer and do not output any other words.\n\nThe following are given documents.\n\n{context}\n\nAnswer the question based on the given documents. Only give me the answer and do not output any other words.\n\nQuestion: {query} Answer:"
usage: qa.py [-h] --save_dir SAVE_DIR --save_name SAVE_NAME [--subset SUBSET] --tokenizer_path TOKENIZER_PATH [--tokenizer_type TOKENIZER_TYPE]
             --max_seq_length MAX_SEQ_LENGTH --tokens_to_generate TOKENS_TO_GENERATE --num_samples NUM_SAMPLES [--pre_samples PRE_SAMPLES]
             [--random_seed RANDOM_SEED] --template TEMPLATE [--remove_newline_tab] --dataset DATASET
qa.py: error: the following arguments are required: --dataset
hsiehjackson commented 1 month ago

We generate our dataset using prepare.py in here. If you want to directly use qa.py, you can set --dataset squad or --dataset hotpotqa. We use both for RULER.

vkaul11 commented 1 month ago

Thanks a lot!