lean-dojo / LeanDojo

Tool for data extraction and interacting with Lean programmatically.
https://leandojo.org
MIT License
478 stars 72 forks source link

Fail to trace without useful error message #139

Closed zoryzhang closed 4 months ago

zoryzhang commented 4 months ago

I'm trying to trace https://github.com/zoryzhang/analogyTP_lean/tree/039f31d2cf63844602bd2202241f41603779795c on a machine with 256GB RAM. After building, it fails without useful error message. Will you be able to reproduce? Versions of lean: leanprover/lean4:v4.6.0-rc1 Version of leandojo: 1.6.0

[4284/4289] Building Mathlib.NumberTheory.NumberField.ClassNumber
[4285/4289] Building Mathlib.NumberTheory.Cyclotomic.Rat
[4287/4289] Building Mathlib
[4288/4289] Building examples
stdout:
./././examples.lean:34:0: warning: declaration uses 'sorry'
2024-02-28 02:20:19.861 | INFO     | __main__:main:188 - Tracing analogyTP_lean
2024-02-28 02:20:21.174 | DEBUG    | __main__:main:193 - lake env lean --threads 32 --run ExtractData.lean
  0%|                                                                                                                                            | 0/5056 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/tmp/tmpuwnnrutn/workspace/build_lean4_repo.py", line 200, in <module>
    main()
  File "/tmp/tmpuwnnrutn/workspace/build_lean4_repo.py", line 194, in main
    run_cmd(cmd, capture_output=True)
  File "/tmp/tmpuwnnrutn/workspace/build_lean4_repo.py", line 29, in run_cmd
    res = subprocess.run(cmd, shell=True, capture_output=capture_output, check=True)
  File "/opt/conda/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'lake env lean --threads 32 --run ExtractData.lean' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/workspace/analogyTP/src/main.py", line 24, in <module>
    data_extraction(GIT_LEAN_REPO, GIT_LEAN_COMMIT, GIT_LEAN_DST_DIR)
  File "/workspace/analogyTP/src/main.py", line 9, in data_extraction
    traced_repo = trace(repo, dst_dir=git_lean_dst_dir)
  File "/workspace/analogyTP/LeanDojo/src/lean_dojo/data_extraction/trace.py", line 193, in trace
    cached_path = get_traced_repo_path(repo, build_deps)
  File "/workspace/analogyTP/LeanDojo/src/lean_dojo/data_extraction/trace.py", line 160, in get_traced_repo_path
    _trace(repo, build_deps)
  File "/workspace/analogyTP/LeanDojo/src/lean_dojo/data_extraction/trace.py", line 68, in _trace
    _trace_lean4(repo, build_deps)
  File "/workspace/analogyTP/LeanDojo/src/lean_dojo/data_extraction/trace.py", line 135, in _trace_lean4
    raise ex
  File "/workspace/analogyTP/LeanDojo/src/lean_dojo/data_extraction/trace.py", line 123, in _trace_lean4
    container.run(
  File "/workspace/analogyTP/LeanDojo/src/lean_dojo/container.py", line 194, in run
    execute(cmd, capture_output=capture_output)
  File "/workspace/analogyTP/LeanDojo/src/lean_dojo/utils.py", line 116, in execute
    raise ex
  File "/workspace/analogyTP/LeanDojo/src/lean_dojo/utils.py", line 111, in execute
    res = subprocess.run(cmd, shell=True, capture_output=capture_output, check=True)
  File "/opt/conda/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'NUM_PROCS=32 python build_lean4_repo.py analogyTP_lean' returned non-zero exit status 1.

UPD: I also tried to remove mathlib (https://github.com/zoryzhang/analogyTP_lean/commit/7bd368112e3943f15fd0ffe0d37c9ab20889961d, with version 4.5.0 specified in toolchain file) Here is the result using VERBOSE=1:

jane@4f89514a2662:/workspace/analogyTP$ VERBOSE=1 python /workspace/analogyTP/src/main.py
2024-02-28 04:38:56.054 | DEBUG    | lean_dojo.data_extraction.lean:<module>:44 - Using GitHub personal access token for authentication
[2024-02-28 04:39:01,922] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
https://github.com/zoryzhang/analogyTP_lean 7bd368112e3943f15fd0ffe0d37c9ab20889961d /workspace/analogyTP/traced_analogyTP_lean/7bd3681
2024-02-28 04:39:03.417 | DEBUG    | lean_dojo.data_extraction.lean:_to_commit_hash:98 - Querying the commit hash for lean4 v4.5.0
2024-02-28 04:39:10.161 | INFO     | lean_dojo.data_extraction.trace:get_traced_repo_path:157 - Tracing LeanGitRepo(url='https://github.com/zoryzhang/analogyTP_lean', commit='7bd368112e3943f15fd0ffe0d37c9ab20889961d')
2024-02-28 04:39:10.163 | DEBUG    | lean_dojo.data_extraction.trace:get_traced_repo_path:159 - Working in the temporary directory /tmp/tmptfe6x_fb
2024-02-28 04:39:10.781 | DEBUG    | lean_dojo.data_extraction.lean:clone_and_checkout:541 - Cloning LeanGitRepo(url='https://github.com/zoryzhang/analogyTP_lean', commit='7bd368112e3943f15fd0ffe0d37c9ab20889961d')
2024-02-28 04:39:11.890 | DEBUG    | lean_dojo.data_extraction.trace:_trace_lean4:111 - Tracing LeanGitRepo(url='https://github.com/zoryzhang/analogyTP_lean', commit='7bd368112e3943f15fd0ffe0d37c9ab20889961d')
2024-02-28 04:39:11.956 | DEBUG    | lean_dojo.container:run:184 - NUM_PROCS=32 python build_lean4_repo.py analogyTP_lean
2024-02-28 04:39:12.128 | INFO     | __main__:main:165 - Building analogyTP_lean
info: mathlib: cloning https://github.com/leanprover-community/mathlib4.git to './.lake/packages/mathlib'
info: std: cloning https://github.com/leanprover/std4 to './.lake/packages/std'
info: Qq: cloning https://github.com/leanprover-community/quote4 to './.lake/packages/Qq'
info: aesop: cloning https://github.com/leanprover-community/aesop to './.lake/packages/aesop'
info: proofwidgets: cloning https://github.com/leanprover-community/ProofWidgets4 to './.lake/packages/proofwidgets'
info: Cli: cloning https://github.com/leanprover/lean4-cli to './.lake/packages/Cli'
info: importGraph: cloning https://github.com/leanprover-community/import-graph.git to './.lake/packages/importGraph'
[0/2] Downloading proofwidgets cloud release
error: > curl -s -f -o ./.lake/packages/proofwidgets/.lake/ProofWidgets4.tar.gz-linux-64.tar.gz -L https://github.com/leanprover-community/ProofWidgets4/releases/download/v0.0.28/ProofWidgets4.tar.gz-linux-64.tar.gz
error: external command `curl` exited with code 22
[1/2] Building examples
2024-02-28 04:39:57.233 | INFO     | __main__:main:188 - Tracing analogyTP_lean
2024-02-28 04:39:57.637 | DEBUG    | __main__:main:193 - lake env lean --threads 32 --run ExtractData.lean
 98%|██████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 726/738 [06:45<00:04,  2.46it/s]2024-02-28 04:46:47.521 | DEBUG    | lean_dojo.data_extraction.traced_data:from_traced_files:1418 - Parsing 731 *.ast.json files in /tmp/tmptfe6x_fb/analogyTP_lean with 31 workers
2024-02-28 04:46:50,031 WARNING services.py:1996 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 1025863680 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=10.24gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2024-02-28 04:46:51,204 INFO worker.py:1715 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
  0%|                                                                                                                  | 0/731 [00:00<?, ?it/s](pid=29201) 2024-02-28 04:46:54.427 | DEBUG    | lean_dojo.data_extraction.lean:<module>:44 - Using GitHub personal access token for authentication
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 731/731 [00:26<00:00, 27.21it/s]
(pid=29197) 2024-02-28 04:46:54.587 | DEBUG    | lean_dojo.data_extraction.lean:<module>:44 - Using GitHub personal access token for authentication [repeated 30x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)
2024-02-28 04:47:21.604 | DEBUG    | lean_dojo.data_extraction.lean:get_dependencies:587 - Querying the dependencies of LeanGitRepo(url='https://github.com/zoryzhang/analogyTP_lean', commit='7bd368112e3943f15fd0ffe0d37c9ab20889961d')
2024-02-28 04:47:23.495 | DEBUG    | lean_dojo.data_extraction.lean:_to_commit_hash:98 - Querying the commit hash for lean4 v4.6.0-rc1
2024-02-28 04:47:33.983 | DEBUG    | lean_dojo.data_extraction.lean:_to_commit_hash:98 - Querying the commit hash for lean4 v4.0.0
2024-02-28 04:47:40.029 | DEBUG    | lean_dojo.data_extraction.lean:url_to_repo:78 - url_to_repo("https://github.com/leanprover-community/import-graph.git") failed. Retrying...
2024-02-28 04:47:41.198 | DEBUG    | lean_dojo.data_extraction.lean:url_to_repo:78 - url_to_repo("https://github.com/leanprover-community/import-graph.git") failed. Retrying...
2024-02-28 04:47:45.512 | DEBUG    | lean_dojo.data_extraction.lean:_to_commit_hash:98 - Querying the commit hash for LeanCopilot v1.1.1
2024-02-28 04:47:48.219 | DEBUG    | lean_dojo.data_extraction.lean:url_to_repo:78 - url_to_repo("https://github.com/leanprover-community/import-graph.git") failed. Retrying...
2024-02-28 04:47:49.379 | DEBUG    | lean_dojo.data_extraction.lean:url_to_repo:78 - url_to_repo("https://github.com/leanprover-community/import-graph.git") failed. Retrying...
2024-02-28 04:47:52.159 | DEBUG    | lean_dojo.data_extraction.lean:_to_commit_hash:98 - Querying the commit hash for doc-gen4 main
2024-02-28 04:47:53.520 | DEBUG    | lean_dojo.data_extraction.lean:_to_commit_hash:98 - Querying the commit hash for std4 main
2024-02-28 04:47:56.851 | DEBUG    | lean_dojo.data_extraction.lean:_to_commit_hash:98 - Querying the commit hash for quote4 master
2024-02-28 04:47:57.052 | DEBUG    | lean_dojo.data_extraction.lean:_to_commit_hash:98 - Querying the commit hash for aesop master
2024-02-28 04:47:57.515 | DEBUG    | lean_dojo.data_extraction.lean:_to_commit_hash:98 - Querying the commit hash for ProofWidgets4 v0.0.28
2024-02-28 04:47:58.405 | DEBUG    | lean_dojo.data_extraction.lean:_to_commit_hash:98 - Querying the commit hash for lean4-cli main
2024-02-28 04:47:59.331 | DEBUG    | lean_dojo.data_extraction.lean:_to_commit_hash:98 - Querying the commit hash for import-graph main
2024-02-28 04:48:02.178 | DEBUG    | lean_dojo.data_extraction.lean:_to_commit_hash:98 - Querying the commit hash for lean4-nightly nightly-2023-07-29
Following Github server redirection from /repos/mhuisi/lean4-cli to /repositories/341363356
2024-02-28 04:48:07.756 | DEBUG    | lean_dojo.data_extraction.lean:_to_commit_hash:98 - Querying the commit hash for lean4-nightly nightly-2023-08-23
2024-02-28 04:48:12.966 | DEBUG    | lean_dojo.data_extraction.lean:_to_commit_hash:98 - Querying the commit hash for lean4 v4.2.0
2024-02-28 04:48:19.496 | DEBUG    | lean_dojo.data_extraction.lean:_to_commit_hash:98 - Querying the commit hash for lean4-nightly nightly-2022-12-16
2024-02-28 04:48:24.692 | DEBUG    | lean_dojo.data_extraction.lean:_to_commit_hash:98 - Querying the commit hash for lean4 v4.5.0-rc1
2024-02-28 04:48:38.924 | DEBUG    | lean_dojo.data_extraction.traced_data:save_to_disk:1461 - Saving 731 traced XML files to /tmp/tmptfe6x_fb/analogyTP_lean with 31 workers
2024-02-28 04:48:41,379 WARNING services.py:1996 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 1025859584 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=10.24gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2024-02-28 04:48:42,543 INFO worker.py:1715 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
  0%|                                                                                                                  | 0/731 [00:00<?, ?it/s](pid=33450) 2024-02-28 04:48:45.615 | DEBUG    | lean_dojo.data_extraction.lean:<module>:44 - Using GitHub personal access token for authentication
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 731/731 [00:11<00:00, 63.78it/s]
(pid=33476) 2024-02-28 04:48:45.935 | DEBUG    | lean_dojo.data_extraction.lean:<module>:44 - Using GitHub personal access token for authentication [repeated 30x across cluster]
2024-02-28 04:49:14.431 | INFO     | lean_dojo.data_extraction.trace:trace:194 - Loading the traced repo from /home/jane/.cache/lean_dojo/zoryzhang-analogyTP_lean-7bd368112e3943f15fd0ffe0d37c9ab20889961d/analogyTP_lean
2024-02-28 04:49:14.928 | DEBUG    | lean_dojo.data_extraction.traced_data:load_from_disk:1490 - Loading 731 traced XML files from /home/jane/.cache/lean_dojo/zoryzhang-analogyTP_lean-7bd368112e3943f15fd0ffe0d37c9ab20889961d/analogyTP_lean with 31 workers
2024-02-28 04:49:17,366 WARNING services.py:1996 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 1025859584 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=10.24gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2024-02-28 04:49:18,498 INFO worker.py:1715 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
  0%|                                                                                                                  | 0/731 [00:00<?, ?it/s](pid=37696) 2024-02-28 04:49:21.823 | DEBUG    | lean_dojo.data_extraction.lean:<module>:44 - Using GitHub personal access token for authentication
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 731/731 [00:37<00:00, 19.36it/s]
(pid=37706) 2024-02-28 04:49:21.969 | DEBUG    | lean_dojo.data_extraction.lean:<module>:44 - Using GitHub personal access token for authentication [repeated 30x across cluster]
2024-02-28 04:49:59.671 | DEBUG    | lean_dojo.data_extraction.lean:get_dependencies:587 - Querying the dependencies of LeanGitRepo(url='https://github.com/zoryzhang/analogyTP_lean', commit='7bd368112e3943f15fd0ffe0d37c9ab20889961d')
2024-02-28 04:49:59.952 | DEBUG    | lean_dojo.data_extraction.lean:url_to_repo:78 - url_to_repo("https://github.com/leanprover-community/import-graph.git") failed. Retrying...
2024-02-28 04:50:01.113 | DEBUG    | lean_dojo.data_extraction.lean:url_to_repo:78 - url_to_repo("https://github.com/leanprover-community/import-graph.git") failed. Retrying...
2024-02-28 04:50:03.709 | DEBUG    | lean_dojo.data_extraction.lean:url_to_repo:78 - url_to_repo("https://github.com/leanprover-community/import-graph.git") failed. Retrying...
2024-02-28 04:50:04.869 | DEBUG    | lean_dojo.data_extraction.lean:url_to_repo:78 - url_to_repo("https://github.com/leanprover-community/import-graph.git") failed. Retrying...
2024-02-28 04:50:09.358 | DEBUG    | lean_dojo.data_extraction.traced_data:check_sanity:1352 - Checking the sanity of TracedRepo(repo=LeanGitRepo(url='https://github.com/zoryzhang/analogyTP_lean', commit='7bd368112e3943f15fd0ffe0d37c9ab20889961d'), dependencies={'lean4': LeanGitRepo(url='https://github.com/leanprover/lean4', commit='5cc0c3f145ffa623a617ca31387d7f589802c1d3'), 'std': LeanGitRepo(url='https://github.com/leanprover/std4', commit='276953b13323ca151939eafaaec9129bf7970306'), 'Qq': LeanGitRepo(url='https://github.com/leanprover-community/quote4', commit='fd760831487e6835944e7eeed505522c9dd47563'), 'aesop': LeanGitRepo(url='https://github.com/leanprover-community/aesop', commit='6beed82dcfbb7731d173cd517675df27d62ad0f4'), 'proofwidgets': LeanGitRepo(url='https://github.com/leanprover-community/ProofWidgets4', commit='f5b2b6ff817890d85ffd8ed47637e36fcac544a6'), 'Cli': LeanGitRepo(url='https://github.com/leanprover/lean4-cli', commit='a751d21d4b68c999accb6fc5d960538af26ad5ec'), 'mathlib': LeanGitRepo(url='https://github.com/leanprover-community/mathlib4', commit='ae9f685e8bfb15fef2a2ee46b979596b211507b9'), '«doc-gen4»': LeanGitRepo(url='https://github.com/leanprover/doc-gen4', commit='b7fad51b87a5f8fb3a238dc820ec40ebfa2a636e'), 'CMark': LeanGitRepo(url='https://github.com/xubaiw/CMark.lean', commit='0077cbbaa92abf855fc1c0413e158ffd8195ec77'), 'UnicodeBasic': LeanGitRepo(url='https://github.com/fgdorais/lean4-unicode-basic', commit='280d75fdfe7be8eb337be7f1bf8479b4aac09f71'), 'leanInk': LeanGitRepo(url='https://github.com/hargonix/LeanInk', commit='2447df5cc6e48eb965c3c3fba87e46d353b5e9f1'), 'importGraph': LeanGitRepo(url='https://github.com/leanprover-community/import-graph', commit='64d082eeaad1a8e6bbb7c23b7a16b85a1715a02f'), 'LeanCopilot': LeanGitRepo(url='https://github.com/lean-dojo/LeanCopilot', commit='4bc4e6c09689472e5523b0acc724c666baf51e6e')}, root_dir=PosixPath('/home/jane/.cache/lean_dojo/zoryzhang-analogyTP_lean-7bd368112e3943f15fd0ffe0d37c9ab20889961d/analogyTP_lean'))
737 theorems/proofs extracted
2024-02-28 04:50:26.688 | WARNING  | lean_dojo.interaction.dojo:__init__:172 - Using Lean 4 without a hard timeout may hang indefinitely.
2024-02-28 04:50:26.689 | DEBUG    | lean_dojo.interaction.dojo:__enter__:192 - Initializing Dojo for Theorem(repo=LeanGitRepo(url='https://github.com/zoryzhang/analogyTP_lean', commit='7bd368112e3943f15fd0ffe0d37c9ab20889961d'), file_path=PosixPath('Lean4Example.lean'), full_name='hello_world')
2024-02-28 04:50:26.691 | DEBUG    | lean_dojo.data_extraction.trace:get_traced_repo_path:165 - The traced repo is available in the cache.
Traceback (most recent call last):
  File "/workspace/analogyTP/LeanDojo/src/lean_dojo/interaction/dojo.py", line 213, in __enter__
    traced_file = self._locate_traced_file(traced_repo_path)
  File "/workspace/analogyTP/LeanDojo/src/lean_dojo/interaction/dojo.py", line 300, in _locate_traced_file
    return TracedFile.from_traced_file(traced_repo_path, json_path, self.repo)
  File "/workspace/analogyTP/LeanDojo/src/lean_dojo/data_extraction/traced_data.py", line 645, in from_traced_file
    raise FileNotFoundError(f"{json_path} does not exist")
FileNotFoundError: /home/jane/.cache/lean_dojo/zoryzhang-analogyTP_lean-7bd368112e3943f15fd0ffe0d37c9ab20889961d/analogyTP_lean/.lake/build/ir/Lean4Example.ast.json does not exist

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/workspace/analogyTP/src/main.py", line 24, in <module>
    data_extraction(GIT_LEAN_REPO, GIT_LEAN_COMMIT, GIT_LEAN_DST_DIR)
  File "/workspace/analogyTP/src/main.py", line 17, in data_extraction
    with Dojo(theorem) as (dojo, init_state):
  File "/workspace/analogyTP/LeanDojo/src/lean_dojo/interaction/dojo.py", line 296, in __enter__
    raise ex
  File "/workspace/analogyTP/LeanDojo/src/lean_dojo/interaction/dojo.py", line 215, in __enter__
    raise DojoInitError(
lean_dojo.interaction.dojo.DojoInitError: Cannot find the *.ast.json file for Theorem(repo=LeanGitRepo(url='https://github.com/zoryzhang/analogyTP_lean', commit='7bd368112e3943f15fd0ffe0d37c9ab20889961d'), file_path=PosixPath('Lean4Example.lean'), full_name='hello_world') in /home/jane/.cache/lean_dojo/zoryzhang-analogyTP_lean-7bd368112e3943f15fd0ffe0d37c9ab20889961d/analogyTP_lean.
zoryzhang commented 4 months ago

Problem solved by using branch https://github.com/lean-dojo/LeanDojo/tree/e3ac808170bfe1686478eb5da30cd0a23d82e48b