Open vishwa27yvs opened 3 months ago
Update: I tried running the same command twice, and both times the process gets killed at 2177/2294 instances, output below.
2024-04-07 12:11:37,130 WARNING Disabling caching
2024-04-07 12:11:39,756 INFO Found {'train', 'dev', 'test'} splits
Adding text inputs: 95%|████████████████████████████████████████████████████▏ | 2176/2294 [4:33:25<27:47, 14.13s/it]create_all_files_benchmark.sh: line 2: 2659349 Killed python create_text_dataset.py --dataset_name_or_path princeton-nlp/SWE-bench --output_dir ./base_datasets --prompt_style style-2 --file_source all
I tried it on 2 different machines and output is exactly same (code gets killed at the same instance), so I am not sure if this issue is pertaining to memory or something else. Would be great to know some way to resolve this
Hi @vishwa27yvs , I have provided an explanation regarding the extended duration of the operation in question. You can find this information at the following location: https://github.com/princeton-nlp/SWE-bench/issues/58#issue-2197776749 It is necessary to rewrite the full pipeline to speed it up
Tagging @carlosejimenez here to address this.
I am trying to generate textual prompts where all the files from the repository are included in the prompt, From the code, I understand I can do so using the following command
python create_text_dataset.py --dataset_name_or_path princeton-nlp/SWE-bench --output_dir ./base_datasets --prompt_style style-2 --file_source all
However, the expected time for the code to run only for the test set is about 9 hours, adding code output below
Is this expected or am I doing something wrong