princeton-nlp / SWE-bench

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
https://www.swebench.com
MIT License
1.44k stars 238 forks source link

Running into errors during evaluation #144

Open ivan4722 opened 5 days ago

ivan4722 commented 5 days ago

Describe the bug

Ran into a few errors

- File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 143, in build_image
    raise docker.errors.BuildError(
docker.errors.BuildError: The command '/bin/sh -c /bin/bash -c "source ~/.bashrc && /root/setup_env.sh"' returned a non-zero code: 1

- File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 149, in build_image
    raise BuildImageError(image_name, str(e), logger) from e
swebench.harness.docker_build.BuildImageError: sweb.env.x86_64.27dd9791e13f5c857a09f9:latest: write /opt/miniconda3/pkgs/cache/09cdf8bf.solv: no space left on device

I am especially not sure why I get the second one, i have 15.93 gb allocated memory and 300gb space on my disk.

Steps/Code to Reproduce

python -m swebench.harness.run_evaluation     --dataset_name princeton-nlp/SWE-bench_Lite     --predictions_path ./gpt-3.5-trajectories/updated_all_preds.jsonl     --max_workers 5     --run_id 1

Using docker desktop with 8 CPUs available (M3 chip) and 15.93gb memory limit

Expected Results

evaluation is ran

Actual Results


Running 300 unevaluated instances...
Base image sweb.base.x86_64:latest already exists, skipping build.
Base images built successfully.
Total environment images to build: 29
Building environment images:  62%|█████████████████████████████████████▏                      | 18/29 [21:52<13:21, 72.89s/it]BuildImageError sweb.env.x86_64.5d1fda9d55d65d8a4e5bdb:latest
Traceback (most recent call last):
  File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 143, in build_image
    raise docker.errors.BuildError(
docker.errors.BuildError: The command '/bin/sh -c /bin/bash -c "source ~/.bashrc && /root/setup_env.sh"' returned a non-zero code: 1

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 312, in build_env_images
    future.result()
  File "/Users/ixiong/miniconda3/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/ixiong/miniconda3/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/Users/ixiong/miniconda3/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 149, in build_image
    raise BuildImageError(image_name, str(e), logger) from e
swebench.harness.docker_build.BuildImageError: sweb.env.x86_64.5d1fda9d55d65d8a4e5bdb:latest: The command '/bin/sh -c /bin/bash -c "source ~/.bashrc && /root/setup_env.sh"' returned a non-zero code: 1
Check (image_build_logs/env/sweb.env.x86_64.5d1fda9d55d65d8a4e5bdb__latest/build_image.log) for more information.
Building environment images:  66%|███████████████████████████████████████▎                    | 19/29 [21:52<11:30, 69.08s/it]BuildImageError sweb.env.x86_64.1c1a6945f732f9391228c5:latest
Traceback (most recent call last):
  File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 143, in build_image
    raise docker.errors.BuildError(
docker.errors.BuildError: The command '/bin/sh -c /bin/bash -c "source ~/.bashrc && /root/setup_env.sh"' returned a non-zero code: 1

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 312, in build_env_images
    future.result()
  File "/Users/ixiong/miniconda3/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/ixiong/miniconda3/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/Users/ixiong/miniconda3/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 149, in build_image
    raise BuildImageError(image_name, str(e), logger) from e
swebench.harness.docker_build.BuildImageError: sweb.env.x86_64.1c1a6945f732f9391228c5:latest: The command '/bin/sh -c /bin/bash -c "source ~/.bashrc && /root/setup_env.sh"' returned a non-zero code: 1
Check (image_build_logs/env/sweb.env.x86_64.1c1a6945f732f9391228c5__latest/build_image.log) for more information.
BuildImageError sweb.env.x86_64.71498c7426dbf05599642f:latest
Traceback (most recent call last):
  File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 143, in build_image
    raise docker.errors.BuildError(
docker.errors.BuildError: The command '/bin/sh -c /bin/bash -c "source ~/.bashrc && /root/setup_env.sh"' returned a non-zero code: 1

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 312, in build_env_images
    future.result()
  File "/Users/ixiong/miniconda3/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/ixiong/miniconda3/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/Users/ixiong/miniconda3/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 149, in build_image
    raise BuildImageError(image_name, str(e), logger) from e
swebench.harness.docker_build.BuildImageError: sweb.env.x86_64.71498c7426dbf05599642f:latest: The command '/bin/sh -c /bin/bash -c "source ~/.bashrc && /root/setup_env.sh"' returned a non-zero code: 1
Check (image_build_logs/env/sweb.env.x86_64.71498c7426dbf05599642f__latest/build_image.log) for more information.
Building environment images:  72%|███████████████████████████████████████████▍                | 21/29 [21:52<08:20, 62.51s/it]BuildImageError sweb.env.x86_64.088a7e628bda9770f9757b:latest
Traceback (most recent call last):
  File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 143, in build_image
    raise docker.errors.BuildError(
docker.errors.BuildError: The command '/bin/sh -c /bin/bash -c "source ~/.bashrc && /root/setup_env.sh"' returned a non-zero code: 1

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 312, in build_env_images
    future.result()
  File "/Users/ixiong/miniconda3/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/ixiong/miniconda3/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/Users/ixiong/miniconda3/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 149, in build_image
    raise BuildImageError(image_name, str(e), logger) from e
swebench.harness.docker_build.BuildImageError: sweb.env.x86_64.088a7e628bda9770f9757b:latest: The command '/bin/sh -c /bin/bash -c "source ~/.bashrc && /root/setup_env.sh"' returned a non-zero code: 1
Check (image_build_logs/env/sweb.env.x86_64.088a7e628bda9770f9757b__latest/build_image.log) for more information.
BuildImageError sweb.env.x86_64.7037e8c448a4b8ebfe9b13:latest
Traceback (most recent call last):
  File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 143, in build_image
    raise docker.errors.BuildError(
docker.errors.BuildError: mkdir /opt/miniconda3/pkgs/pandas-2.2.2-py311ha02d727_0/lib/python3.11/site-packages/pandas/tests/arrays/floating: no space left on device

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 312, in build_env_images
    future.result()
  File "/Users/ixiong/miniconda3/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/ixiong/miniconda3/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/Users/ixiong/miniconda3/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 149, in build_image
    raise BuildImageError(image_name, str(e), logger) from e
swebench.harness.docker_build.BuildImageError: sweb.env.x86_64.7037e8c448a4b8ebfe9b13:latest: mkdir /opt/miniconda3/pkgs/pandas-2.2.2-py311ha02d727_0/lib/python3.11/site-packages/pandas/tests/arrays/floating: no space left on device
Check (image_build_logs/env/sweb.env.x86_64.7037e8c448a4b8ebfe9b13__latest/build_image.log) for more information.
Building environment images:  90%|█████████████████████████████████████████████████████▊      | 26/29 [25:41<02:57, 59.29s/it]BuildImageError sweb.env.x86_64.aa92880033da20ca313928:latest
Traceback (most recent call last):
  File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 143, in build_image
    raise docker.errors.BuildError(
docker.errors.BuildError: write /opt/miniconda3/envs/testbed/lib/libQt5WebEngineCore.so.5.9.7: no space left on device

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 312, in build_env_images
    future.result()
  File "/Users/ixiong/miniconda3/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/ixiong/miniconda3/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/Users/ixiong/miniconda3/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 149, in build_image
    raise BuildImageError(image_name, str(e), logger) from e
swebench.harness.docker_build.BuildImageError: sweb.env.x86_64.aa92880033da20ca313928:latest: write /opt/miniconda3/envs/testbed/lib/libQt5WebEngineCore.so.5.9.7: no space left on device
Check (image_build_logs/env/sweb.env.x86_64.aa92880033da20ca313928__latest/build_image.log) for more information.
Building environment images:  93%|███████████████████████████████████████████████████████▊    | 27/29 [25:46<01:54, 57.27s/it]BuildImageError sweb.env.x86_64.27dd9791e13f5c857a09f9:latest
Traceback (most recent call last):
  File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 143, in build_image
    raise docker.errors.BuildError(
docker.errors.BuildError: write /opt/miniconda3/pkgs/cache/09cdf8bf.solv: no space left on device

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 312, in build_env_images
    future.result()
  File "/Users/ixiong/miniconda3/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/ixiong/miniconda3/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/Users/ixiong/miniconda3/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/docker_build.py", line 149, in build_image
    raise BuildImageError(image_name, str(e), logger) from e
swebench.harness.docker_build.BuildImageError: sweb.env.x86_64.27dd9791e13f5c857a09f9:latest: write /opt/miniconda3/pkgs/cache/09cdf8bf.solv: no space left on device
Check (image_build_logs/env/sweb.env.x86_64.27dd9791e13f5c857a09f9__latest/build_image.log) for more information.
Building environment images: 100%|████████████████████████████████████████████████████████████| 29/29 [26:09<00:00, 54.14s/it]
7 environment images failed to build.
Running 300 instances...
  0%|                                                     
``` (Current running)

### System Information

MAC OS M3, swebench 2.0.1, python 3.12.3
john-b-yang commented 5 days ago
Screenshot 2024-06-27 at 4 43 34 PM

You may have to increase your virtual disk limit. The screenshot from above is taken from Docker Desktop, where I have my limit set to 64GB, which should be enough for lite (w/ a cache level of env).

This section of the report has more information about choosing the right level of caching wrt the amount of storage you have.

ivan4722 commented 4 days ago
Screenshot 2024-06-27 at 4 43 34 PM

You may have to increase your virtual disk limit. The screenshot from above is taken from Docker Desktop, where I have my limit set to 64GB, which should be enough for lite (w/ a cache level of env).

This section of the report has more information about choosing the right level of caching wrt the amount of storage you have.

I was already using virtual disk limit of 64gb but I can try increasing it. I will update here if I still run into the issue.

ivan4722 commented 4 days ago
Screenshot 2024-06-27 at 4 43 34 PM

You may have to increase your virtual disk limit. The screenshot from above is taken from Docker Desktop, where I have my limit set to 64GB, which should be enough for lite (w/ a cache level of env).

This section of the report has more information about choosing the right level of caching wrt the amount of storage you have.

That seem to fixed that issue, but now I am getting

EvaluationError django__django-11999: django__django-11999: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-11999/run_instance.log) for more information.
  9%|███████▋                                                                              | 24/270 [06:48<1:09:52, 17.04s/it]EvaluationError django__django-12184: django__django-12184: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-12184/run_instance.log) for more information.
  9%|███████▉                                                                              | 25/270 [06:50<1:06:59, 16.41s/it]EvaluationError django__django-12113: django__django-12113: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-12113/run_instance.log) for more information.
 10%|████████▎                                                                             | 26/270 [06:50<1:04:09, 15.78s/it]EvaluationError django__django-12125: django__django-12125: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-12125/run_instance.log) for more information.
 10%|████████▌                                                                             | 27/270 [07:13<1:04:58, 16.04s/it]EvaluationError django__django-12284: django__django-12284: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-12284/run_instance.log) for more information.
 10%|████████▉                                                                             | 28/270 [07:15<1:02:40, 15.54s/it]EvaluationError django__django-12286: django__django-12286: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-12286/run_instance.log) for more information.
 11%|█████████▏                                                                            | 29/270 [07:52<1:05:26, 16.29s/it]EvaluationError django__django-12497: django__django-12497: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-12497/run_instance.log) for more information.
 11%|█████████▌                                                                            | 30/270 [07:53<1:03:05, 15.77s/it]EvaluationError django__django-12308: django__django-12308: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-12308/run_instance.log) for more information.
 11%|█████████▊                                                                            | 31/270 [07:57<1:01:20, 15.40s/it]EvaluationError django__django-12589: django__django-12589: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-12589/run_instance.log) for more information.
 12%|██████████▏                                                                           | 32/270 [08:07<1:00:25, 15.23s/it]EvaluationError django__django-12747: django__django-12747: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-12747/run_instance.log) for more information.
 12%|██████████▊                                                                             | 33/270 [08:10<58:42, 14.86s/it]EvaluationError django__django-12708: django__django-12708: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-12708/run_instance.log) for more information.
 13%|██████████▊                                                                           | 34/270 [08:50<1:01:21, 15.60s/it]EvaluationError django__django-12856: django__django-12856: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-12856/run_instance.log) for more information.
 13%|███████████▍                                                                            | 35/270 [08:50<59:22, 15.16s/it]EvaluationError django__django-12908: django__django-12908: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-12908/run_instance.log) for more information.
 13%|███████████▋                                                                            | 36/270 [08:59<58:26, 14.98s/it]EvaluationError django__django-12915: django__django-12915: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-12915/run_instance.log) for more information.
 14%|████████████                                                                            | 37/270 [09:08<57:36, 14.84s/it]EvaluationError django__django-12983: django__django-12983: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-12983/run_instance.log) for more information.
 14%|████████████▍                                                                           | 38/270 [09:10<56:00, 14.48s/it]EvaluationError django__django-13028: django__django-13028: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-13028/run_instance.log) for more information.
 14%|████████████▋                                                                           | 39/270 [09:55<58:47, 15.27s/it]EvaluationError django__django-13158: django__django-13158: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-13158/run_instance.log) for more information.
 15%|█████████████                                                                           | 40/270 [09:58<57:19, 14.95s/it]EvaluationError django__django-13033: django__django-13033: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-13033/run_instance.log) for more information.
 15%|█████████████▎                                                                          | 41/270 [10:05<56:23, 14.77s/it]EvaluationError django__django-13220: django__django-13220: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-13220/run_instance.log) for more information.
 16%|█████████████▋                                                                          | 42/270 [10:15<55:43, 14.67s/it]EvaluationError django__django-13265: django__django-13265: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-13265/run_instance.log) for more information.
 16%|██████████████                                                                          | 43/270 [10:16<54:15, 14.34s/it]EvaluationError django__django-13230: django__django-13230: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-13230/run_instance.log) for more information.
 16%|██████████████▎                                                                         | 44/270 [10:55<56:05, 14.89s/it]EvaluationError django__django-13321: django__django-13321: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-13321/run_instance.log) for more information.
 17%|██████████████▋                                                                         | 45/270 [11:01<55:05, 14.69s/it]EvaluationError django__django-13448: django__django-13448: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-13448/run_instance.log) for more information.
 17%|██████████████▉                                                                         | 46/270 [11:02<53:44, 14.40s/it]EvaluationError django__django-13447: django__django-13447: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-13447/run_instance.log) for more information.
 17%|███████████████▎                                                                        | 47/270 [11:09<52:55, 14.24s/it]EvaluationError django__django-13551: django__django-13551: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-13551/run_instance.log) for more information.
EvaluationError django__django-13590: django__django-13590: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-13590/run_instance.log) for more information.
 18%|███████████████▉                                                                        | 49/270 [11:40<52:41, 14.30s/it]EvaluationError django__django-13658: django__django-13658: data must be str, not NoneType
Check (run_instance_logs/1/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/django__django-13658/run_instance.log) for more information.

I think it is because the model does not produce any output, example:

{"model_name_or_path": "gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1", "instance_id": "django__django-13964", "model_patch": null}
(base) nb24-12252:django__django-13964 ixiong$ cat run_instance.log
2024-06-28 10:37:29,399 - INFO - Environment image sweb.env.x86_64.297af196949a2a635bce66:latest found for django__django-13964
Building instance image sweb.eval.x86_64.django__django-13964:latest for django__django-13964
2024-06-28 10:38:03,905 - INFO - Creating container for django__django-13964...
2024-06-28 10:38:03,935 - INFO - Container for django__django-13964 created: 37bb67c58377984cddcfcc74a48ae589ee196e5767c2718254770d6eb13ce13e
2024-06-28 10:38:04,064 - INFO - Container for django__django-13964 started: 37bb67c58377984cddcfcc74a48ae589ee196e5767c2718254770d6eb13ce13e
2024-06-28 10:38:04,064 - ERROR - Error in evaluating model for django__django-13964: data must be str, not NoneType
2024-06-28 10:38:04,065 - INFO - Traceback (most recent call last):
  File "/Users/ixiong/Desktop/SWE-bench/swebench/harness/run_evaluation.py", line 109, in run_instance
    patch_file.write_text(pred["model_patch"])
  File "/Users/ixiong/miniconda3/lib/python3.12/pathlib.py", line 1044, in write_text
    raise TypeError('data must be str, not %s' %
TypeError: data must be str, not NoneType

2024-06-28 10:38:04,065 - INFO - Attempting to stop container sweb.eval.django__django-13964.1...
2024-06-28 10:38:19,396 - INFO - Attempting to remove container sweb.eval.django__django-13964.1...
2024-06-28 10:38:19,407 - INFO - Container sweb.eval.django__django-13964.1 removed.
2024-06-28 10:38:19,407 - INFO - Attempting to remove image sweb.eval.x86_64.django__django-13964:latest...
2024-06-28 10:38:19,539 - INFO - Image sweb.eval.x86_64.django__django-13964:latest removed.

But will this still run and produce results.json?

john-b-yang commented 4 days ago

Ah ok yeah we saw this yesterday too. Tagging @carlosejimenez who will push the patch for this today. But yes, it should still produce a report.json despite the error message.

ivan4722 commented 4 days ago

Ah ok yeah we saw this yesterday too. Tagging @carlosejimenez who will push the patch for this today. But yes, it should still produce a report.json despite the error message.

My run is currently stuck?

EvaluationError scikit-learn__scikit-learn-13497: scikit-learn__scikit-learn-13497: data must be str, not NoneType
Check (run_instance_logs/2/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/scikit-learn__scikit-learn-13497/run_instance.log) for more information.
 65%|███████████████████████████████████████████████████████▎                             | 195/300 [1:35:37<51:29, 29.42s/it]EvaluationError scikit-learn__scikit-learn-13584: scikit-learn__scikit-learn-13584: data must be str, not NoneType
Check (run_instance_logs/2/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/scikit-learn__scikit-learn-13584/run_instance.log) for more information.
 65%|███████████████████████████████████████████████████████▌                             | 196/300 [1:41:27<53:50, 31.06s/it]EvaluationError scikit-learn__scikit-learn-13779: scikit-learn__scikit-learn-13779: data must be str, not NoneType
Check (run_instance_logs/2/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/scikit-learn__scikit-learn-13779/run_instance.log) for more information.
 66%|███████████████████████████████████████████████████████▊                             | 197/300 [1:42:31<53:36, 31.22s/it]EvaluationError scikit-learn__scikit-learn-14894: scikit-learn__scikit-learn-14894: data must be str, not NoneType
Check (run_instance_logs/2/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/scikit-learn__scikit-learn-14894/run_instance.log) for more information.
 67%|████████████████████████████████████████████████████████▋                            | 200/300 [1:48:52<54:26, 32.66s/it]EvaluationError scikit-learn__scikit-learn-14983: scikit-learn__scikit-learn-14983: data must be str, not NoneType
Check (run_instance_logs/2/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/scikit-learn__scikit-learn-14983/run_instance.log) for more information.
 67%|████████████████████████████████████████████████████████▉                            | 201/300 [1:50:13<54:17, 32.90s/it]EvaluationError scikit-learn__scikit-learn-15512: scikit-learn__scikit-learn-15512: data must be str, not NoneType
Check (run_instance_logs/2/gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1/scikit-learn__scikit-learn-15512/run_instance.log) for more information.
 68%|█████████████████████████████████████████████████████████▊                           | 204/300 [1:55:10<54:11, 33.87s/it

This is the last thing that was outputted, and i cannot find report.json nor evaluation_results directory.

Heres the run instance log gpt subdirectory


(base) nb24-12252:gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1 ixiong$ ls
astropy__astropy-12907          psf__requests-1963
astropy__astropy-14182          psf__requests-2148
astropy__astropy-14365          psf__requests-2317
astropy__astropy-14995          psf__requests-2674
astropy__astropy-6938           psf__requests-3362
astropy__astropy-7746           psf__requests-863
django__django-10914            pydata__xarray-3364
django__django-10924            pydata__xarray-4094
django__django-11001            pydata__xarray-4248
django__django-11019            pydata__xarray-4493
django__django-11039            pydata__xarray-5131
django__django-11049            pylint-dev__pylint-5859
django__django-11099            pylint-dev__pylint-6506
django__django-11133            pylint-dev__pylint-7080
django__django-11179            pylint-dev__pylint-7114
django__django-11283            pylint-dev__pylint-7228
django__django-11422            pylint-dev__pylint-7993
django__django-11564            pytest-dev__pytest-11143
django__django-11583            pytest-dev__pytest-11148
django__django-11620            pytest-dev__pytest-5103
django__django-11630            pytest-dev__pytest-5221
django__django-11742            pytest-dev__pytest-5227
django__django-11797            pytest-dev__pytest-5413
django__django-11815            pytest-dev__pytest-5495
django__django-11848            pytest-dev__pytest-5692
django__django-11905            pytest-dev__pytest-6116
django__django-11910            pytest-dev__pytest-7168
django__django-11964            pytest-dev__pytest-7220
django__django-11999            pytest-dev__pytest-7373
django__django-12113            pytest-dev__pytest-7432
django__django-12125            pytest-dev__pytest-7490
django__django-12184            pytest-dev__pytest-8365
django__django-12284            pytest-dev__pytest-8906
django__django-12286            pytest-dev__pytest-9359
django__django-12308            scikit-learn__scikit-learn-10297
django__django-12453            scikit-learn__scikit-learn-10508
django__django-12470            scikit-learn__scikit-learn-10949
django__django-12497            scikit-learn__scikit-learn-11040
django__django-12589            scikit-learn__scikit-learn-11281
django__django-12700            scikit-learn__scikit-learn-12471
django__django-12708            scikit-learn__scikit-learn-13142
django__django-12747            scikit-learn__scikit-learn-13241
django__django-12856            scikit-learn__scikit-learn-13439
django__django-12908            scikit-learn__scikit-learn-13496
django__django-12915            scikit-learn__scikit-learn-13497
django__django-12983            scikit-learn__scikit-learn-13584
django__django-13028            scikit-learn__scikit-learn-13779
django__django-13033            scikit-learn__scikit-learn-14087
django__django-13158            scikit-learn__scikit-learn-14092
django__django-13220            scikit-learn__scikit-learn-14894
django__django-13230            scikit-learn__scikit-learn-14983
django__django-13265            scikit-learn__scikit-learn-15512
django__django-13315            scikit-learn__scikit-learn-15535
django__django-13321            scikit-learn__scikit-learn-25500
django__django-13401            scikit-learn__scikit-learn-25570
django__django-13447            scikit-learn__scikit-learn-25638
django__django-13448            scikit-learn__scikit-learn-25747
django__django-13551            sphinx-doc__sphinx-10325
django__django-13590            sphinx-doc__sphinx-10451
django__django-13658            sphinx-doc__sphinx-11445
django__django-13660            sphinx-doc__sphinx-7686
django__django-13710            sphinx-doc__sphinx-7738
django__django-13757            sphinx-doc__sphinx-7975
django__django-13768            sphinx-doc__sphinx-8273
django__django-13925            sphinx-doc__sphinx-8282
django__django-13933            sphinx-doc__sphinx-8435
django__django-13964            sphinx-doc__sphinx-8474
django__django-14016            sphinx-doc__sphinx-8506
django__django-14017            sphinx-doc__sphinx-8595
django__django-14155            sphinx-doc__sphinx-8627
django__django-14238            sphinx-doc__sphinx-8713
django__django-14382            sphinx-doc__sphinx-8721
django__django-14411            sphinx-doc__sphinx-8801
django__django-14534            sympy__sympy-11400
django__django-14580            sympy__sympy-11870
django__django-14608            sympy__sympy-11897
django__django-14667            sympy__sympy-12171
django__django-14672            sympy__sympy-12236
django__django-14730            sympy__sympy-12419
django__django-14752            sympy__sympy-12454
django__django-14787            sympy__sympy-12481
django__django-14855            sympy__sympy-13031
django__django-14915            sympy__sympy-13043
django__django-14997            sympy__sympy-13146
django__django-14999            sympy__sympy-13177
django__django-15061            sympy__sympy-13437
django__django-15202            sympy__sympy-13471
django__django-15213            sympy__sympy-13480
django__django-15252            sympy__sympy-13647
django__django-15320            sympy__sympy-13773
django__django-15347            sympy__sympy-13895
django__django-15388            sympy__sympy-13915
django__django-15400            sympy__sympy-13971
django__django-15498            sympy__sympy-14024
django__django-15695            sympy__sympy-14308
django__django-15738            sympy__sympy-14317
django__django-15781            sympy__sympy-14396
django__django-15789            sympy__sympy-14774
django__django-15790            sympy__sympy-14817
django__django-15814            sympy__sympy-15011
django__django-15819            sympy__sympy-15308
django__django-15851            sympy__sympy-15345
django__django-15902            sympy__sympy-15346
django__django-15996            sympy__sympy-15609
django__django-16041            sympy__sympy-15678
django__django-16046            sympy__sympy-16106
django__django-16139            sympy__sympy-16281
django__django-16229            sympy__sympy-16503
django__django-16255            sympy__sympy-16792
django__django-16379            sympy__sympy-16988
django__django-16400            sympy__sympy-17022
django__django-16408            sympy__sympy-17139
django__django-16527            sympy__sympy-17630
django__django-16595            sympy__sympy-17655
django__django-16816            sympy__sympy-18057
django__django-16820            sympy__sympy-18087
django__django-16873            sympy__sympy-18189
django__django-16910            sympy__sympy-18199
django__django-17051            sympy__sympy-18532
django__django-17087            sympy__sympy-18621
matplotlib__matplotlib-18869        sympy__sympy-18698
matplotlib__matplotlib-22711        sympy__sympy-18835
matplotlib__matplotlib-22835        sympy__sympy-19007
matplotlib__matplotlib-23299        sympy__sympy-19254
matplotlib__matplotlib-23314        sympy__sympy-19487
matplotlib__matplotlib-23476        sympy__sympy-20049
matplotlib__matplotlib-23562        sympy__sympy-20154
matplotlib__matplotlib-23563        sympy__sympy-20212
matplotlib__matplotlib-23913        sympy__sympy-20322
matplotlib__matplotlib-23964        sympy__sympy-20442
matplotlib__matplotlib-23987        sympy__sympy-20590
matplotlib__matplotlib-24149        sympy__sympy-20639
matplotlib__matplotlib-24265        sympy__sympy-21055
matplotlib__matplotlib-24334        sympy__sympy-21171
matplotlib__matplotlib-24970        sympy__sympy-21379
matplotlib__matplotlib-25079        sympy__sympy-21612
matplotlib__matplotlib-25311        sympy__sympy-21614
matplotlib__matplotlib-25332        sympy__sympy-21627
matplotlib__matplotlib-25433        sympy__sympy-21847
matplotlib__matplotlib-25442        sympy__sympy-22005
matplotlib__matplotlib-25498        sympy__sympy-22714
matplotlib__matplotlib-26011        sympy__sympy-22840
matplotlib__matplotlib-26020        sympy__sympy-23117
mwaskom__seaborn-2848           sympy__sympy-23191
mwaskom__seaborn-3010           sympy__sympy-23262
mwaskom__seaborn-3190           sympy__sympy-24066
mwaskom__seaborn-3407           sympy__sympy-24102
pallets__flask-4045         sympy__sympy-24152
pallets__flask-4992         sympy__sympy-24213
pallets__flask-5063         sympy__sympy-24909
(base) nb24-12252:gpt-3.5-turbo-0125__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1 ixiong$ find . -type d | wc -l
     301

any ideas?