Closed rah4927 closed 1 year ago
Looks like LeanDojo failed to handle the /
at the end of the URL https://github.com/rah4927/lean-dojo-mew/
correctly. Can you try https://github.com/rah4927/lean-dojo-mew
instead of https://github.com/rah4927/lean-dojo-mew/
?
@yangky11 Hey, that seems to have been the issue, thanks!
Are you able to get the traced repo as expected (without other errors)? If so, I'll fix this error and release a new version (v1.2.2).
@yangky11 the repo is tracing as we speak, but it certainly went past the initial errors and has cloned and built mathlib. I will let you know once it traces completely, but seems like it's working so far.
@yangky11 Hi there, so I let it run, and it got the following error while tracing.
2023-08-22 17:29:57.308 | INFO | __main__:main:138 - Tracing lean-dojo-mew
95%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 1893/1993 [1:04:41<02:09, 1.30s/it]2023-08-22 18:34:56.959 | INFO | lean_dojo.utils:execute:114 -
2023-08-22 18:34:56.960 | ERROR | lean_dojo.utils:execute:115 - error: Unknown subcommand: get-url
usage: git remote [-v | --verbose]
or: git remote add [-t <branch>] [-m <master>] [-f] [--tags|--no-tags] [--mirror=<fetch|push>] <name> <url>
or: git remote rename <old> <new>
or: git remote remove <name>
or: git remote set-head <name> (-a | -d | <branch>)
or: git remote [-v | --verbose] show [-n] <name>
or: git remote prune [-n | --dry-run] <name>
or: git remote [-v | --verbose] update [-p | --prune] [(<group> | <remote>)...]
or: git remote set-branches [--add] <name> <branch>...
or: git remote set-url [--push] <name> <newurl> [<oldurl>]
or: git remote set-url --add <name> <newurl>
or: git remote set-url --delete <name> <url>
-v, --verbose be verbose; must be placed before a subcommand
Traceback (most recent call last):
File "/lilac/data/peer/rsaha/misc./dojo.py", line 9, in <module>
with Dojo(theorem) as (dojo, init_state):
File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/interaction/dojo.py", line 283, in __enter__
raise ex
File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/interaction/dojo.py", line 203, in __enter__
traced_repo_path = get_traced_repo_path(self.repo)
File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/trace.py", line 135, in get_traced_repo_path
traced_repo = TracedRepo.from_traced_files(tmp_dir / repo.name)
File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/traced_data.py", line 1384, in from_traced_files
repo = LeanGitRepo.from_path(root_dir)
File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/lean.py", line 407, in from_path
url, commit = get_repo_info(path)
File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/utils.py", line 172, in get_repo_info
url_msg, _ = execute(f"git remote get-url origin", capture_output=True)
File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/utils.py", line 116, in execute
raise ex
File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/utils.py", line 111, in execute
res = subprocess.run(cmd, shell=True, capture_output=capture_output, check=True)
File "/home/sahar1/.pyenv/versions/3.9.16/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'git remote get-url origin' returned non-zero exit status 129.
Not sure why this is happening, any thoughts are appreciated.
from here, it seems that get-url is not a thing for git 1.8.x, which is where the problem is. There probably should be another command that works for older git versions. I believe it is git config --get remote.origin.url
but I have not tested it myself @yangky11
In the meantime I will try an updated version of git and let it run again. Will keep you updated.
Yes, that's why https://github.com/lean-dojo/LeanDojo#requirements says Git >= 2.25. I'm not aware of any workaround for outdated Git, you're welcome to submit a PR.
@yangky11 Got it, having run it again with updated git, I got the following error:
2023-08-22 21:28:28.444 | INFO | __main__:main:138 - Tracing lean-dojo-mew
95%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 1893/1993 [1:06:50<02:40, 1.61s/it]2023-08-22 22:35:32.536 | DEBUG | lean_dojo.data_extraction.traced_data:from_traced_files:1388 - Parsing 1894 *.ast.json files in /tmp/tmp085we507/lean-dojo-mew with 7 workers
2023-08-22 22:35:44,220 INFO worker.py:1612 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1894/1894 [04:37<00:00, 6.83it/s]
Traceback (most recent call last):
File "/lilac/data/peer/rsaha/misc./dojo.py", line 9, in <module>
with Dojo(theorem) as (dojo, init_state):
File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/interaction/dojo.py", line 283, in __enter__
raise ex
File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/interaction/dojo.py", line 203, in __enter__
traced_repo_path = get_traced_repo_path(self.repo)
File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/trace.py", line 135, in get_traced_repo_path
traced_repo = TracedRepo.from_traced_files(tmp_dir / repo.name)
File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/traced_data.py", line 1409, in from_traced_files
traced_files_graph = _build_dependency_graph(traced_files)
File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/traced_data.py", line 1243, in _build_dependency_graph
assert G.has_node(dep_path_str)
AssertionError
Presumably it is constructing some kind of dependency graph of the imports, but I am not sure what precisely is getting the assertion to fail.
I wasn't able to reproduce this problem. I tried running
import os
os.environ['CONTAINER'] = 'native'
os.environ['VERBOSE'] = '1'
from lean_dojo import *
os.environ['VERBOSE']
repo = LeanGitRepo("https://github.com/rah4927/lean-dojo-mew/", "12fec4d9f39ac9a41c6c1f058efab3ec41c028af")
theorem = Theorem(repo, "MiniF2F/Example.lean", "amc12a_2019_p21")
with Dojo(theorem) as (dojo, init_state):
print(init_state)
, and got the following output:
2023-08-22 20:05:41.642 | WARNING | lean_dojo.interaction.dojo:__init__:172 - Using Lean 4 without a hard timeout may hang indefinitely.
2023-08-22 20:05:41.643 | DEBUG | lean_dojo.interaction.dojo:__enter__:192 - Initializing Dojo for Theorem(repo=LeanGitRepo(url='https://github.com/rah4927/lean-dojo-mew', commit='12fec4d9f39ac9a41c6c1f058efab3ec41c028af'), file_path=PosixPath('MiniF2F/Example.lean'), full_name='amc12a_2019_p21')
2023-08-22 20:05:41.826 | INFO | lean_dojo.data_extraction.trace:get_traced_repo_path:131 - Tracing LeanGitRepo(url='https://github.com/rah4927/lean-dojo-mew', commit='12fec4d9f39ac9a41c6c1f058efab3ec41c028af')
2023-08-22 20:05:41.827 | DEBUG | lean_dojo.data_extraction.trace:get_traced_repo_path:133 - Working in the temporary directory /raid/kaiyu/tmp/tmpiea_68xo
2023-08-22 20:05:42.341 | DEBUG | lean_dojo.data_extraction.lean:clone_and_checkout:438 - Cloning LeanGitRepo(url='https://github.com/rah4927/lean-dojo-mew', commit='12fec4d9f39ac9a41c6c1f058efab3ec41c028af')
2023-08-22 20:05:42.734 | DEBUG | lean_dojo.data_extraction.trace:_trace_lean4:97 - Tracing LeanGitRepo(url='https://github.com/rah4927/lean-dojo-mew', commit='12fec4d9f39ac9a41c6c1f058efab3ec41c028af')
2023-08-22 20:05:42.741 | DEBUG | lean_dojo.container:run:183 - NUM_PROCS=32 python3 build_lean4_repo.py lean-dojo-mew
2023-08-22 20:05:43.011 | INFO | __main__:main:105 - Building lean-dojo-mew
info: downloading component 'lean'
info: installing component 'lean'
info: cloning https://github.com/leanprover-community/mathlib4.git to ./lake-packages/mathlib
info: cloning https://github.com/leanprover/std4 to ./lake-packages/std
info: cloning https://github.com/gebner/quote4 to ./lake-packages/Qq
info: cloning https://github.com/JLimperg/aesop to ./lake-packages/aesop
info: cloning https://github.com/mhuisi/lean4-cli.git to ./lake-packages/Cli
info: cloning https://github.com/EdAyers/ProofWidgets4 to ./lake-packages/proofwidgets
Downloading proofwidgets/v0.0.13/linux-64.tar.gz
[0/21] Building Std.Lean.TagAttribute
[0/22] Building Std.Lean.Command
[0/23] Building Std.Tactic.Unreachable
[0/28] Building Std.Tactic.OpenPrivate
[0/31] Building Std.Tactic.ByCases
[0/32] Building Std.Tactic.SeqFocus
[0/32] Building Std.Lean.Position
[0/32] Building Std.Lean.Name
...
...
...
[1259/1263] Building MiniF2F.Minif2fImport
[1260/1263] Building MiniF2F.Example
stdout:
./././MiniF2F/Example.lean:11:8: warning: declaration uses 'sorry'
[1261/1263] Building MiniF2F.All
[1262/1263] Building MiniF2F
2023-08-22 20:21:23.224 | INFO | __main__:main:138 - Tracing lean-dojo-mew
95%|█████████▍| 1893/1993 [19:46<01:53, 1.13s/it]2023-08-22 20:42:00.791 | DEBUG | lean_dojo.data_extraction.traced_data:from_traced_files:1388 - Parsing 1895 *.ast.json files in /raid/kaiyu/tmp/tmpiea_68xo/lean-dojo-mew with 31 workers
2023-08-22 20:42:02,776 INFO worker.py:1627 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
100%|██████████| 1895/1895 [03:47<00:00, 8.35it/s]
2023-08-22 20:46:10.132 | DEBUG | lean_dojo.data_extraction.traced_data:save_to_disk:1435 - Saving 1895 traced XML files to /raid/kaiyu/tmp/tmpiea_68xo/lean-dojo-mew with 31 workers
2023-08-22 20:46:13,480 INFO worker.py:1627 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
100%|██████████| 1895/1895 [01:03<00:00, 29.99it/s]
2023-08-22 20:47:32.792 | DEBUG | lean_dojo.interaction.dojo:_modify_file:398 - Modifying MiniF2F/Example.lean
2023-08-22 20:49:33.167 | DEBUG | lean_dojo.interaction.dojo:__enter__:225 - Launching the proof using <class 'lean_dojo.container.NativeContainer'>
2023-08-22 20:49:35.995 | DEBUG | lean_dojo.container:run:183 - lake build Lean4Repl
2023-08-22 20:49:45.404 | WARNING | lean_dojo.container:run_interactive:209 - Disregarding `cpu_limit = 1 since NativeContainer does not support CPU limit.`
2023-08-22 20:49:45.404 | WARNING | lean_dojo.container:run_interactive:213 - Disregarding `memory_limit = 16g` since NativeContainer does not support memory limit.
2023-08-22 20:49:48.956 | DEBUG | lean_dojo.container:run_interactive:221 - lake env lean MiniF2F/Example.lean
2023-08-22 20:49:52.730 | DEBUG | lean_dojo.interaction.dojo:_read_next_line:587 - REPL> {"tacticState": "z : ℂ\nh₀ : z = (1 + Complex.I) / ↑(Real.sqrt 2)\n⊢ (∑ k in Finset.Icc 1 12, z ^ ↑k ^ 2) * ∑ k in Finset.Icc 1 12, 1 / z ^ ↑k ^ 2 = 36", "sid": 0, "error": null}
2023-08-22 20:49:52.740 | DEBUG | lean_dojo.interaction.dojo:_submit_request:547 - Request: exit
2023-08-22 20:49:52.815 | DEBUG | lean_dojo.interaction.dojo:_read_next_line:587 -
2023-08-22 20:49:52.815 | DEBUG | lean_dojo.interaction.dojo:_cleanup:331 - Cleaning up.
2023-08-22 20:49:52.815 | DEBUG | lean_dojo.interaction.dojo:_cleanup_container:341 - Cleaning up the container.
2023-08-22 20:49:56.664 | DEBUG | lean_dojo.interaction.dojo:_cleanup_tmp_dir:354 - Cleaning up the temporary directory.
TacticState(pp='z : ℂ\nh₀ : z = (1 + Complex.I) / ↑(Real.sqrt 2)\n⊢ (∑ k in Finset.Icc 1 12, z ^ ↑k ^ 2) * ∑ k in Finset.Icc 1 12, 1 / z ^ ↑k ^ 2 = 36', id=0, message=None)
Can you try cleaning up everything and re-installing LeanDojo from the main
branch:
rm -rf ~/.cache/lean_dojo && pip install git+https://github.com/lean-dojo/LeanDojo --upgrade --ignore-installed
Then try running the original script to trace the repo?
Ran those commands and then ran the original script. Threw same error :/ @yangky11
It's very difficult for me to locate the error since I cannot reproduce it.
Do you see the traced_repo = TracedRepo.from_traced_files(tmp_dir / repo.name)
and Working in the temporary directory XXX
in the log? I might be able to help if you can:
git clone https://github.com/lean-dojo/LeanDojo && cd LeanDojo && pip install -e .
import pdb; pdb.set_trace()
before traced_repo = TracedRepo.from_traced_files(tmp_dir / repo.name)
in src/lean_dojo/data_extraction/trace.py.VERBOSE=1
. When the breakpoint is triggered, zip the temporary directory (in the debug log) and send it to me (kaiyuy@caltech.edu). Note that the temporary directory will be deleted after the program exists (That's why we need to set the breakpoint). @yangky11 Thanks! Just sent you a link to the zipped temp directory
I'll look into the problem when I get a chance. As a temporary workaround, you can use the repo traced on my machine. Just unzip this file into ~/.cache/lean_dojo/
(override what's already there, if any). Then everything should work as if the repo were traced on your machine.
BTW, why are you setting CONTAINER=native
? Is Docker not an option for you? As mentioned in the docs, running with Docker is less likely to run into problems.
Hi @yangky11 thank you for the workaround, though this might not be the best since this is only a minimal example, and the original repo has hundreds of theorems (though the structure of the project is largely the same). Let me know once you get some time to take a look at the folder I sent.
As for docker, I am currently running it on a remote machine without root access, and I believe docker is generally meant to be used with root access. If you offer an alternative (such as Singularity), that could work. Thanks!
I'm almost sure it was an OOM error when tracing Mathlib/Data/Real/Basic.lean. I'm adding more checks so that it would report the error rather than just fail silently. But that wouldn't solve the memory issue. How much memory do you have?
@yangky11 you were right! that was the issue, increasing the memory limit of the job solved this problem, and the tracing completed without error. Thanks a lot! This issue can be closed now.
Description Hey guys, I tried tracing a theorem from the following repository - lean-dojo-mew, and commit number
12fec4d9f39ac9a41c6c1f058efab3ec41c028af
, and I got the following error message:Detailed Steps to Reproduce the Behavior
pip install .
from lean_dojo import * os.environ['VERBOSE'] repo = LeanGitRepo("https://github.com/rah4927/lean-dojo-mew/", "12fec4d9f39ac9a41c6c1f058efab3ec41c028af") theorem = Theorem(repo, "MiniF2F/Example.lean", "amc12a_2019_p21")
with Dojo(theorem) as (dojo, init_state): print(init_state)
2023-08-22 16:26:36.756 | WARNING | lean_dojo.interaction.dojo:init:172 - Using Lean 4 without a hard timeout may hang indefinitely. 2023-08-22 16:26:36.757 | DEBUG | lean_dojo.interaction.dojo:enter:192 - Initializing Dojo for Theorem(repo=LeanGitRepo(url='https://github.com/rah4927/lean-dojo-mew/', commit='12fec4d9f39ac9a41c6c1f058efab3ec41c028af'), file_path=PosixPath('MiniF2F/Example.lean'), full_name='amc12a_2019_p21') 2023-08-22 16:26:37.048 | INFO | lean_dojo.data_extraction.trace:get_traced_repo_path:131 - Tracing LeanGitRepo(url='https://github.com/rah4927/lean-dojo-mew/', commit='12fec4d9f39ac9a41c6c1f058efab3ec41c028af') 2023-08-22 16:26:37.049 | DEBUG | lean_dojo.data_extraction.trace:get_traced_repo_path:133 - Working in the temporary directory /tmp/tmpcgh6yp06 2023-08-22 16:26:37.434 | DEBUG | lean_dojo.data_extraction.lean:clone_and_checkout:436 - Cloning LeanGitRepo(url='https://github.com/rah4927/lean-dojo-mew/', commit='12fec4d9f39ac9a41c6c1f058efab3ec41c028af') 2023-08-22 16:26:37.755 | INFO | lean_dojo.utils:execute:114 - 2023-08-22 16:26:37.755 | ERROR | lean_dojo.utils:execute:115 - fatal: Not a git repository (or any parent up to mount point /tmp) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
Traceback (most recent call last): File "/lilac/data/peer/rsaha/misc./dojo.py", line 9, in
with Dojo(theorem) as (dojo, init_state):
File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/interaction/dojo.py", line 283, in enter
raise ex
File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/interaction/dojo.py", line 203, in enter
traced_repo_path = get_traced_repo_path(self.repo)
File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/trace.py", line 134, in get_traced_repo_path
_trace(repo)
File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/trace.py", line 66, in _trace
_trace_lean4(repo)
File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/trace.py", line 95, in _trace_lean4
repo.clone_and_checkout()
File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/lean.py", line 439, in clone_and_checkout
execute(
File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/utils.py", line 116, in execute
raise ex
File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/utils.py", line 111, in execute
res = subprocess.run(cmd, shell=True, capture_output=capture_output, check=True)
File "/home/sahar1/.pyenv/versions/3.9.16/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'git checkout 12fec4d9f39ac9a41c6c1f058efab3ec41c028af && git submodule update --recursive' returned non-zero exit status 128.
Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 4 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6242R CPU @ 3.10GHz Stepping: 7 CPU MHz: 3099.999 BogoMIPS: 6199.99 Hypervisor vendor: VMware Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 36608K NUMA node0 CPU(s): 0-7
NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"