princeton-nlp / SWE-bench

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
https://www.swebench.com
MIT License
1.47k stars 241 forks source link

Has anyone successfully ran an eval on patches against early versions of astropy, sympy, scipy etc? I'm really struggling to run things from earlier python versions #119

Closed PandelisZ closed 5 days ago

PandelisZ commented 1 month ago

Describe the issue

Been attempting to execute against older environments like Django 1.7, 1.10 or astropy 1.3.

I can't even get past installing the deps for those projects let alone get functioning unit tests to run.

Suggest an improvement to documentation

No response

john-b-yang commented 1 month ago

Hi @PandelisZ, can you provide more information of the commands you're trying to run or the log outputs that you see? What system are you running on?

PandelisZ commented 1 month ago

Hi @john-b-yang

Here's an example execution within the range of package versions and patches in the constants.py

Using the provided swe-bench_test data from huggingface and only valid golden patches from the atropy dataset

available here: https://gist.github.com/PandelisZ/1ecc74f0720f860ed96b9ee23a7a149c

python ./swebench/harness/run_evaluation.py \
  --predictions_path ./datasets/astropy.predictions.json \
  --swe_bench_tasks ./datasets/swe-bench_test.json \
  --log_dir ./log_dir/ \
  --testbed ./testbed

Using the minoconda that swebench auto installs

SWE bench fails to even initialise the project

talling collected packages: astropy, pytest-remotedata, pytest-mock, pytest-filter-subpackage, pytest-doctestplus, pytest-cov, pytest-astropy-header, pytest-arraydiff, pytest-astropy
  Running setup.py develop for astropy
    error: subprocess-exited-with-error

    × python setup.py develop did not run successfully.
    │ exit code: 1
    ╰─> [598 lines of output]
        /Users/pz/w/cosine/tools/swe-bench/testbed/astropy/astropy/1.3/tmptwzpk4h8/miniconda3/envs/astropy__astropy__1.3/lib/python3.9/site-packages/setuptools/__init__.py:81: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
        !!

                ********************************************************************************
                Requirements should be satisfied by a PEP 517 installer.
                If you are using pip, you can try `pip install --use-pep517`.
                ********************************************************************************

        !!
          dist.fetch_build_eggs(dist.setup_requires)
        running develop
        /Users/pz/w/cosine/tools/swe-bench/testbed/astropy/astropy/1.3/tmptwzpk4h8/miniconda3/envs/astropy__astropy__1.3/lib/python3.9/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
        !!

                ********************************************************************************
                Please avoid running ``setup.py`` and ``easy_install``.
                Instead, use pypa/build, pypa/installer or other
                standards-based tools.

                See https://github.com/pypa/setuptools/issues/917 for details.
                ********************************************************************************

        !!
          easy_install.initialize_options(self)
        /Users/pz/w/cosine/tools/swe-bench/testbed/astropy/astropy/1.3/tmptwzpk4h8/miniconda3/envs/astropy__astropy__1.3/lib/python3.9/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
        !!

                ********************************************************************************
                Please avoid running ``setup.py`` directly.
                Instead, use pypa/build, pypa/installer or other
                standards-based tools.

                See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
                ********************************************************************************

        !!
          self.initialize_options()
        running egg_info
        creating astropy.egg-info
        writing astropy.egg-info/PKG-INFO
        writing dependency_links to astropy.egg-info/dependency_links.txt
        writing entry points to astropy.egg-info/entry_points.txt
        writing requirements to astropy.egg-info/requires.txt
        writing top-level names to astropy.egg-info/top_level.txt
        writing manifest file 'astropy.egg-info/SOURCES.txt'
        cythoning astropy/table/_np_utils.pyx to astropy/table/_np_utils.c
        /Users/pz/w/cosine/tools/swe-bench/testbed/astropy/astropy/1.3/tmpmxb5i8hy/astropy__astropy__1.3/.eggs/Cython-3.0.10-py3.9.egg/Cython/Compiler/Main.py:381: FutureWarning: Cython directive 'language_level' not set, using '3str' for now (Py3). This has changed from earlier releases! File: /Users/pz/w/cosine/tools/swe-bench/testbed/astropy/astropy/1.3/tmpmxb5i8hy/astropy__astropy__1.3/astropy/table/_np_utils.pyx
          tree = Parsing.p_module(s, pxd, full_module_name)
john-b-yang commented 2 weeks ago

Hmm, what system are you running on?

This is what the testbed set up logs look like for astropy:

There's a bunch more in swe-bench/experiments under the validation/ folder you can reference to see what successful builds should look like.

john-b-yang commented 5 days ago

@PandelisZ thanks again for creating this issue + detailed run through of what was going wrong. 🙏🏼

I think these problems should be resolved by the containerized approach to evaluation we released in #142 (report here). Unlike before, SWE-bench evaluation doesn't run directly on your machine anymore, with Docker containers offering some abstraction for the execution environment. We've had a lot of success running it on GCP / AWS machines.

I'm closing this issue now, as it's a bit on the older side, but if you're still working on SWE-bench, please feel free to pull the latest version + run evals, and if problems arise, we can continue discussing them as a new issue!