princeton-nlp / SWE-bench

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
https://www.swebench.com
MIT License
1.78k stars 308 forks source link

Want a docker image tar file #211

Open WentaoTan opened 3 weeks ago

WentaoTan commented 3 weeks ago

Describe the issue

I always encounter bugs when building Docker, specifically with this code (Line 121 in swebench/harness/docker_build.py):

response = client.api.build( path=str(build_dir), tag=image_name, rm=True, forcerm=True, decode=True, platform=platform, nocache=nocache, ) I noticed that this API can use an existing docker.tar file as input. The API document (https://docker-py.readthedocs.io/en/stable/api.html#module-docker.api.build) states: "If you have a tar file for the Docker build context (including a Dockerfile) already, pass a readable file-like object to fileobj and also pass custom_context=True. If the stream is compressed also, set encoding to the correct value (e.g., gzip)."

I want to ask if there is a downloadable tar file like this?

My errors are: 1724644095335

I have tried these solutions: (1) RUN echo "nameserver 8.8.8.8" >> /etc/resolv.conf

But it doesn't seem to actually add this sentence to the file /etc/resolv.conf

(2) RUN sed -i s@/archive.ubuntu.com/@/cn.archive.ubuntu.com/@g /etc/apt/sources.list RUN sed -i s@/security.ubuntu.com/@/cn.archive.ubuntu.com/@g /etc/apt/sources.list

Replace source doesn't work.

Suggest an improvement to documentation

No response

WentaoTan commented 3 weeks ago

Wow! Thanks a lot for your quick response!

HaomiaoPan commented 3 weeks ago

Is there any responses here? I have the same problem. Thanks

WentaoTan commented 3 weeks ago

I encountered a problem while performing inference following this document:https://github.com/princeton-nlp/SWE-bench/blob/main/swebench/inference/README.md

When using the API model, the dataset filters out 226 instances for testing; 1724666527319 however, when using the LLaMA model, the dataset consists of 2,294 instances. 1724666584984

I would like to know what the correct number of test samples should be. Is it normal for the Meta-Llama-3.1-70B-Instruct model to take 18 hours for a single test using 4 A100 80G GPUs?