princeton-nlp / SWE-bench

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
https://www.swebench.com
MIT License
1.76k stars 295 forks source link

Yanked package `types-pkg-resources` causes failures when evaluating on `sqlfluff` #199

Open klieret opened 1 month ago

klieret commented 1 month ago

See also https://github.com/princeton-nlp/SWE-agent/issues/707, first reported by @waterson samizdis.

The package types-pkg-resources package was yanked from pypi (see pypi.org/project/types-pkg-resources/#history and pypi.org/project/types-pkg-resources).

This means that any package with requirements.txt that specifies it will fail installing.

Steps to reproduce:

  1. Clean all docker images
  2. Create an empty all_preds.jsonl for sqlfluff__sqlfluff-1625, like the one below
    {"model_name_or_path": "instant_empty_submit__dev23__default__t-0.00__p-0.95__c-3.00__install-1", "instance_id": "sqlfluff__sqlfluff-1625", "model_patch": "\ndiff --git a/reproduce.py b/reproduce.py\nnew file mode 100644\nindex 00000000..8b137891\n--- /dev/null\n+++ b/reproduce.py\n@@ -0,0 +1 @@\n+\n"}
  3. Run
    python -m swebench.harness.run_evaluation \
    --predictions_path all_preds.jsonl \
    --max_workers 1 \
    --run_id test --split dev

Grep for yanked to find error in build_image.log.

Possible fix: Identify all versions of sqlfluff that have types-pkg-resources in the requirements.txt and explicitly pin and install types-pkg-resources (probably to the latest version) in the extra pip packages.

klieret commented 1 month ago

I'm honestly confused how the dependency ends up there.

It appears directly in setup_env.sh

#!/bin/bash
set -euxo pipefail
source /opt/miniconda3/bin/activate
conda create -n testbed python=3.9 -y
cat <<'EOF_59812759871' > $HOME/requirements.txt
flake8
flake8-docstrings
flake8-black
doc8
Pygments
coverage
hypothesis
pytest
pytest-cov
pytest-sugar
mypy
types-toml
types-pkg_resources
types-chardet
requests

EOF_59812759871
conda activate testbed && python -m pip install -r $HOME/requirements.txt
rm $HOME/requirements.txt
conda activate testbed

so I would assume I should either find it in our constants.py, or somewhere in the requirements.txt. But the former is not the case, and the latter also doesn't seem to be (e.g., here's requirements.txt for v0.7)...

HejiaZ2023 commented 1 month ago

I'm honestly confused how the dependency ends up there.

@klieret I think it's redirected to "requirement_dev.txt" by get_requirements() -> get_requirements_by_commit() Like MAP_REPO_VERSION_TO_SPECS says the install package type is "requirements.txt", then get_requirements() is called; it looks in MAP_REPO_TO_REQS_PATHS and got "requirement_dev.txt"

>>> from swebench.harness.constants import MAP_REPO_VERSION_TO_SPECS, MAP_REPO_TO_REQS_PATHS
2024-08-11 18:12:17,556 - datasets - INFO - PyTorch version 2.3.0 available.
>>> MAP_REPO_VERSION_TO_SPECS['sqlfluff/sqlfluff']['0.6']
{'python': '3.9', 'packages': 'requirements.txt', 'install': 'python -m pip install -e .', 'test_cmd': 'pytest -rA'}
>>> MAP_REPO_TO_REQS_PATHS['sqlfluff/sqlfluff']
['requirements_dev.txt']
klieret commented 1 month ago

Thanks @HejiaZ2023, indeed https://github.com/sqlfluff/sqlfluff/blob/0.7.0/requirements_dev.txt shows types-pkg_resources in there.

In fact, they only removed it last week when the package was yanked: https://github.com/sqlfluff/sqlfluff/pull/6039

codelion commented 3 weeks ago

@klieret any idea when this will be fixed? is there a workaround until then?

klieret commented 3 weeks ago

I'll open a PR with a workaround later today/early tomorrow (there's a few ways to do this and Carlos/John also wanted to take a look, so I was waiting for feedback)

jatinganhotra commented 1 week ago

@klieret / @john-b-yang / @carlosejimenez - I wanted to circle back on this issue and see if you got a chance to look at it. Thanks