argument error and where to get train_tasks.txt

2018211801 commented 1 year ago

At first, I completely followed your code and instructions, but encountered an error: run_gpt3.py: error: argument --add_task_definition: invalid typing.Union[bool, NoneType] value: 'True'. So I gave the bool type parameter directly inside the function, and then encountered another problem, where can I obtain the train_tasks.txt file?

Using custom data configuration default-b8b565d058db1a48 Downloading and preparing dataset natural_instructions/default to /home/wxc/.cache/huggingface/datasets/natural_instructions/default-b8b565d058db1a48/2.0.0/5e549c913246c072ed4feb17b909342a33028cc1ad960438adf1d9702f3faa0f... Traceback (most recent call last): File "/home/wxc/workspace/ICIL/src/run_gpt3.py", line 70, in raw_datasets = load_dataset( File "/home/wxc/miniconda3/envs/icil/lib/python3.8/site-packages/datasets/load.py", line 1694, in load_dataset builder_instance.download_and_prepare( File "/home/wxc/miniconda3/envs/icil/lib/python3.8/site-packages/datasets/builder.py", line 595, in download_and_prepare self._download_and_prepare( File "/home/wxc/miniconda3/envs/icil/lib/python3.8/site-packages/datasets/builder.py", line 685, in _download_and_prepare raise OSError( OSError: Cannot find data file. Original error: [Errno 2] No such file or directory: 'data/splits/default/train_tasks.txt'

hbin0701 commented 1 year ago

Hello! It seems like you haven't downloaded the dataset needed for evaluation. Note that in Readme.md, under Dataset section, it is written:

For evaluation dataset, we used SuperNatural-Instructions, which can be assessed in the official repo. Simply clone the repo under this directory and change the directory name to data"

To do this, you can run below commands.

git clone https://github.com/allenai/natural-instructions
mv natural-instructions data

If the error still persists, please let us know. Thanks :)

2018211801 commented 1 year ago

Thank you vey much!!! I'm sorry that I overlooked that part. And I still don't understand below parameter issue.I only made changes on "openai_key".

wxc@cogmind:~/ICIL$ sh scripts/gpt3/run_ICIL.sh ICIL davinci usage: run_gpt3.py [-h] [--lang LANG] [--data_dir DATA_DIR] [--task_dir TASK_DIR] [--overwrite_cache [OVERWRITE_CACHE]] [--preprocessing_num_workers PREPROCESSING_NUM_WORKERS] [--max_source_length MAX_SOURCE_LENGTH] [--max_target_length MAX_TARGET_LENGTH] [--pad_to_max_length [PAD_TO_MAX_LENGTH]] [--max_num_instances_per_task MAX_NUM_INSTANCES_PER_TASK] [--max_num_instances_per_eval_task MAX_NUM_INSTANCES_PER_EVAL_TASK] [--max_train_samples MAX_TRAIN_SAMPLES] [--max_eval_samples MAX_EVAL_SAMPLES] [--max_predict_samples MAX_PREDICT_SAMPLES] [--num_beams NUM_BEAMS] [--ignore_pad_token_for_loss [IGNORE_PAD_TOKEN_FOR_LOSS]] [--no_ignore_pad_token_for_loss] [--source_prefix SOURCE_PREFIX] [--forced_bos_token FORCED_BOS_TOKEN] [--add_task_name ADD_TASK_NAME] [--add_task_definition ADD_TASK_DEFINITION] [--num_pos_examples NUM_POS_EXAMPLES] [--num_neg_examples NUM_NEG_EXAMPLES] [--add_explanation ADD_EXPLANATION] [--tk_instruct TK_INSTRUCT] [--output_dir OUTPUT_DIR] [--gpt3_temprature GPT3_TEMPRATURE] [--gpt3_top_p GPT3_TOP_P] [--engine ENGINE] [--icil [ICIL]] [--demo_path DEMO_PATH] [--adaptive [ADAPTIVE]] [--cc_news_path CC_NEWS_PATH] [--irrelevant [IRRELEVANT]] run_gpt3.py: error: argument --add_task_definition: invalid typing.Union[bool, NoneType] value: 'True' LOADED Traceback (most recent call last): File "src/compute_metrics.py", line 143, in with open(args.predictions) as fin: FileNotFoundError: [Errno 2] No such file or directory: 'output/ICIL_davinci/predicted_examples.jsonl'

2018211801 commented 1 year ago

Sorry to bother you again, but I have encountered another problem after fixing the parameters,how to solve it?

Downloading and preparing dataset natural_instructions/default to /home/wxc/.cache/huggingface/datasets/natural_instructions/default-b8b565d058db1a48/2.0.0/5e549c913246c072ed4feb17b909342a33028cc1ad960438adf1d9702f3faa0f... Traceback (most recent call last): File "/home/wxc/workspace/ICIL/src/run_gpt3.py", line 70, in raw_datasets = load_dataset( File "/home/wxc/miniconda3/envs/icil/lib/python3.8/site-packages/datasets/load.py", line 1694, in load_dataset builder_instance.download_and_prepare( File "/home/wxc/miniconda3/envs/icil/lib/python3.8/site-packages/datasets/builder.py", line 595, in download_and_prepare self._download_and_prepare( File "/home/wxc/miniconda3/envs/icil/lib/python3.8/site-packages/datasets/builder.py", line 685, in _download_and_prepare raise OSError( OSError: Cannot find data file. Original error: [Errno 2] No such file or directory: 'data/splits/default/dev_tasks.txt'

hbin0701 commented 1 year ago

No worries! :) Since we're specifying list of tasks to evaluate on data/splits/default/test_tasks.txt, what goes in 'data/splits/default/dev_tasks.txt' and 'data/splits/default/train_tasks.txt' do not matter. Therefore, you can solve this error just by simply making an empty file 'data/splits/default/dev_tasks.txt'.

For instance,

touch data/splits/default/dev_tasks.txt

2018211801 commented 1 year ago

Although I didn't quite understand which tasks could have dev datasets, I managed to run the experiment reluctantly. Thanks very much!

seonghyeonye / TAPP

argument error and where to get train_tasks.txt #1