lean-dojo / LeanDojo

Tool for data extraction and interacting with Lean programmatically.
https://leandojo.org
MIT License
536 stars 80 forks source link

ModuleNotFoundError: No module named 'loguru' #41

Closed rah4927 closed 1 year ago

rah4927 commented 1 year ago

Description Hey guys, I tried tracing a theorem from the following repository - lean-dojo-mew, and commit number 12fec4d9f39ac9a41c6c1f058efab3ec41c028af, and I got the following error message:

2023-08-22 16:04:56.513 | WARNING  | lean_dojo.interaction.dojo:__init__:172 - Using Lean 4 without a hard timeout may hang indefinitely.
2023-08-22 16:04:56.801 | INFO     | lean_dojo.data_extraction.trace:get_traced_repo_path:131 - Tracing LeanGitRepo(url='https://github.com/rah4927/lean4-example', commit='529eed015655c59d6f44649800e871f50964d216')
Traceback (most recent call last):
  File "build_lean4_repo.py", line 12, in <module>
    from loguru import logger
ModuleNotFoundError: No module named 'loguru'
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
Cell In[50], line 1
----> 1 with Dojo(theorem) as (dojo, init_state):
      2     print(init_state)

File ~/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/interaction/dojo.py:283, in Dojo.__enter__(self)
    281 os.chdir(self.origin_dir)
    282 shutil.rmtree(self.tmp_dir)
--> 283 raise ex

File ~/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/interaction/dojo.py:203, in Dojo.__enter__(self)
    200 os.chdir(self.tmp_dir)
    202 # Copy and `cd` into the repo.
--> 203 traced_repo_path = get_traced_repo_path(self.repo)
    204 shutil.copytree(
    205     traced_repo_path,
    206     self.repo.name,
    207     ignore=ignore_patterns("*.dep_paths", "*.ast.json", "*.trace.xml"),
    208 )
    209 os.chdir(self.repo.name)

File ~/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/trace.py:134, in get_traced_repo_path(repo)
    132 with working_directory() as tmp_dir:
    133     logger.debug(f"Working in the temporary directory {tmp_dir}")
--> 134     _trace(repo)
    135     traced_repo = TracedRepo.from_traced_files(tmp_dir / repo.name)
    136     traced_repo.save_to_disk()

File ~/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/trace.py:66, in _trace(repo)
     64 else:
     65     assert repo.uses_lean4
---> 66     _trace_lean4(repo)

File ~/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/trace.py:104, in _trace_lean4(repo)
     98 container = get_container()
     99 mts = {
    100     Path.cwd() / repo.name: f"/workspace/{repo.name}",
    101     LEAN4_BUILD_SCRIPT_PATH: f"/workspace/{LEAN4_BUILD_SCRIPT_PATH.name}",
    102     LEAN4_DATA_EXTRACTOR_PATH: f"/workspace/{repo.name}/{LEAN4_DATA_EXTRACTOR_PATH.name}",
    103 }
--> 104 container.run(
    105     f"python3 build_lean4_repo.py {repo.name}",
    106     create_mounts(mts),
    107     {"NUM_PROCS": NUM_PROCS},
    108     as_current_user=True,
    109     work_dir="/workspace",
    110 )

File ~/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/container.py:193, in NativeContainer.run(self, command, mounts, envs, as_current_user, capture_output, cpu_limit, memory_limit, work_dir)
    190         work_dir = Path.cwd() / work_dir.relative_to(work_dir.root)
    192 with working_directory(work_dir):
--> 193     execute(cmd, capture_output=capture_output)
    195 self._unmount_files(mounts)

File ~/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/utils.py:116, in execute(cmd, capture_output)
    114         logger.info(ex.stdout.decode())
    115         logger.error(ex.stderr.decode())
--> 116     raise ex
    117 if not capture_output:
    118     return None

File ~/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/utils.py:111, in execute(cmd, capture_output)
    101 """Execute the shell command ``cmd`` and optionally return its output.
    102 
    103 Args:
   (...)
    108     Optional[Tuple[str, str]]: The command's output, including stdout and stderr (None if ``capture_output == False``).
    109 """
    110 try:
--> 111     res = subprocess.run(cmd, shell=True, capture_output=capture_output, check=True)
    112 except subprocess.CalledProcessError as ex:
    113     if capture_output:

File ~/.pyenv/versions/3.9.16/lib/python3.9/subprocess.py:528, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
    526     retcode = process.poll()
    527     if check and retcode:
--> 528         raise CalledProcessError(retcode, process.args,
    529                                  output=stdout, stderr=stderr)
    530 return CompletedProcess(process.args, retcode, stdout, stderr)

CalledProcessError: Command 'NUM_PROCS=32 python3 build_lean4_repo.py lean4-example' returned non-zero exit status 1.

Detailed Steps to Reproduce the Behavior

  1. First, I installed the latest lean-dojo package by downloading the git repository and running pip install .
  2. All the dependencies are installed (including loguru)
  3. Then, I ran the following code in a jupyter notebook
    
    import os
    os.environ['CONTAINER'] = 'native'
    os.environ['VERBOSE'] = '1'

from lean_dojo import * os.environ['VERBOSE'] repo = LeanGitRepo("https://github.com/rah4927/lean-dojo-mew/", "12fec4d9f39ac9a41c6c1f058efab3ec41c028af") theorem = Theorem(repo, "MiniF2F/Example.lean", "amc12a_2019_p21")

with Dojo(theorem) as (dojo, init_state): print(init_state)

**Logs in Debug Mode**

2023-08-22 16:26:36.756 | WARNING | lean_dojo.interaction.dojo:init:172 - Using Lean 4 without a hard timeout may hang indefinitely. 2023-08-22 16:26:36.757 | DEBUG | lean_dojo.interaction.dojo:enter:192 - Initializing Dojo for Theorem(repo=LeanGitRepo(url='https://github.com/rah4927/lean-dojo-mew/', commit='12fec4d9f39ac9a41c6c1f058efab3ec41c028af'), file_path=PosixPath('MiniF2F/Example.lean'), full_name='amc12a_2019_p21') 2023-08-22 16:26:37.048 | INFO | lean_dojo.data_extraction.trace:get_traced_repo_path:131 - Tracing LeanGitRepo(url='https://github.com/rah4927/lean-dojo-mew/', commit='12fec4d9f39ac9a41c6c1f058efab3ec41c028af') 2023-08-22 16:26:37.049 | DEBUG | lean_dojo.data_extraction.trace:get_traced_repo_path:133 - Working in the temporary directory /tmp/tmpcgh6yp06 2023-08-22 16:26:37.434 | DEBUG | lean_dojo.data_extraction.lean:clone_and_checkout:436 - Cloning LeanGitRepo(url='https://github.com/rah4927/lean-dojo-mew/', commit='12fec4d9f39ac9a41c6c1f058efab3ec41c028af') 2023-08-22 16:26:37.755 | INFO | lean_dojo.utils:execute:114 - 2023-08-22 16:26:37.755 | ERROR | lean_dojo.utils:execute:115 - fatal: Not a git repository (or any parent up to mount point /tmp) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).

Traceback (most recent call last): File "/lilac/data/peer/rsaha/misc./dojo.py", line 9, in with Dojo(theorem) as (dojo, init_state): File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/interaction/dojo.py", line 283, in enter raise ex File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/interaction/dojo.py", line 203, in enter traced_repo_path = get_traced_repo_path(self.repo) File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/trace.py", line 134, in get_traced_repo_path _trace(repo) File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/trace.py", line 66, in _trace _trace_lean4(repo) File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/trace.py", line 95, in _trace_lean4 repo.clone_and_checkout() File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/lean.py", line 439, in clone_and_checkout execute( File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/utils.py", line 116, in execute raise ex File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/utils.py", line 111, in execute res = subprocess.run(cmd, shell=True, capture_output=capture_output, check=True) File "/home/sahar1/.pyenv/versions/3.9.16/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'git checkout 12fec4d9f39ac9a41c6c1f058efab3ec41c028af && git submodule update --recursive' returned non-zero exit status 128.


**Platform Information**
- System 

Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 4 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6242R CPU @ 3.10GHz Stepping: 7 CPU MHz: 3099.999 BogoMIPS: 6199.99 Hypervisor vendor: VMware Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 36608K NUMA node0 CPU(s): 0-7


 - OS: 

NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"


 - Browser (running the notebook on chrome) 
 - LeanDojo Version - 1.2.1 

Any help is apppreciated, thank you! 
yangky11 commented 1 year ago

Looks like LeanDojo failed to handle the / at the end of the URL https://github.com/rah4927/lean-dojo-mew/ correctly. Can you try https://github.com/rah4927/lean-dojo-mew instead of https://github.com/rah4927/lean-dojo-mew/?

rah4927 commented 1 year ago

@yangky11 Hey, that seems to have been the issue, thanks!

yangky11 commented 1 year ago

Are you able to get the traced repo as expected (without other errors)? If so, I'll fix this error and release a new version (v1.2.2).

rah4927 commented 1 year ago

@yangky11 the repo is tracing as we speak, but it certainly went past the initial errors and has cloned and built mathlib. I will let you know once it traces completely, but seems like it's working so far.

rah4927 commented 1 year ago

@yangky11 Hi there, so I let it run, and it got the following error while tracing.

2023-08-22 17:29:57.308 | INFO     | __main__:main:138 - Tracing lean-dojo-mew
 95%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏       | 1893/1993 [1:04:41<02:09,  1.30s/it]2023-08-22 18:34:56.959 | INFO     | lean_dojo.utils:execute:114 -
2023-08-22 18:34:56.960 | ERROR    | lean_dojo.utils:execute:115 - error: Unknown subcommand: get-url
usage: git remote [-v | --verbose]
   or: git remote add [-t <branch>] [-m <master>] [-f] [--tags|--no-tags] [--mirror=<fetch|push>] <name> <url>
   or: git remote rename <old> <new>
   or: git remote remove <name>
   or: git remote set-head <name> (-a | -d | <branch>)
   or: git remote [-v | --verbose] show [-n] <name>
   or: git remote prune [-n | --dry-run] <name>
   or: git remote [-v | --verbose] update [-p | --prune] [(<group> | <remote>)...]
   or: git remote set-branches [--add] <name> <branch>...
   or: git remote set-url [--push] <name> <newurl> [<oldurl>]
   or: git remote set-url --add <name> <newurl>
   or: git remote set-url --delete <name> <url>

    -v, --verbose         be verbose; must be placed before a subcommand

Traceback (most recent call last):
  File "/lilac/data/peer/rsaha/misc./dojo.py", line 9, in <module>
    with Dojo(theorem) as (dojo, init_state):
  File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/interaction/dojo.py", line 283, in __enter__
    raise ex
  File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/interaction/dojo.py", line 203, in __enter__
    traced_repo_path = get_traced_repo_path(self.repo)
  File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/trace.py", line 135, in get_traced_repo_path
    traced_repo = TracedRepo.from_traced_files(tmp_dir / repo.name)
  File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/traced_data.py", line 1384, in from_traced_files
    repo = LeanGitRepo.from_path(root_dir)
  File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/lean.py", line 407, in from_path
    url, commit = get_repo_info(path)
  File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/utils.py", line 172, in get_repo_info
    url_msg, _ = execute(f"git remote get-url origin", capture_output=True)
  File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/utils.py", line 116, in execute
    raise ex
  File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/utils.py", line 111, in execute
    res = subprocess.run(cmd, shell=True, capture_output=capture_output, check=True)
  File "/home/sahar1/.pyenv/versions/3.9.16/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'git remote get-url origin' returned non-zero exit status 129.

Not sure why this is happening, any thoughts are appreciated.

rah4927 commented 1 year ago

from here, it seems that get-url is not a thing for git 1.8.x, which is where the problem is. There probably should be another command that works for older git versions. I believe it is git config --get remote.origin.url but I have not tested it myself @yangky11

In the meantime I will try an updated version of git and let it run again. Will keep you updated.

yangky11 commented 1 year ago

Yes, that's why https://github.com/lean-dojo/LeanDojo#requirements says Git >= 2.25. I'm not aware of any workaround for outdated Git, you're welcome to submit a PR.

rah4927 commented 1 year ago

@yangky11 Got it, having run it again with updated git, I got the following error:

2023-08-22 21:28:28.444 | INFO     | __main__:main:138 - Tracing lean-dojo-mew
 95%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏       | 1893/1993 [1:06:50<02:40,  1.61s/it]2023-08-22 22:35:32.536 | DEBUG    | lean_dojo.data_extraction.traced_data:from_traced_files:1388 - Parsing 1894 *.ast.json files in /tmp/tmp085we507/lean-dojo-mew with 7 workers
2023-08-22 22:35:44,220 INFO worker.py:1612 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1894/1894 [04:37<00:00,  6.83it/s]
Traceback (most recent call last):
  File "/lilac/data/peer/rsaha/misc./dojo.py", line 9, in <module>
    with Dojo(theorem) as (dojo, init_state):
  File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/interaction/dojo.py", line 283, in __enter__
    raise ex
  File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/interaction/dojo.py", line 203, in __enter__
    traced_repo_path = get_traced_repo_path(self.repo)
  File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/trace.py", line 135, in get_traced_repo_path
    traced_repo = TracedRepo.from_traced_files(tmp_dir / repo.name)
  File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/traced_data.py", line 1409, in from_traced_files
    traced_files_graph = _build_dependency_graph(traced_files)
  File "/home/sahar1/.pyenv/versions/lean-dojo-benchmark/lib/python3.9/site-packages/lean_dojo/data_extraction/traced_data.py", line 1243, in _build_dependency_graph
    assert G.has_node(dep_path_str)
AssertionError

Presumably it is constructing some kind of dependency graph of the imports, but I am not sure what precisely is getting the assertion to fail.

yangky11 commented 1 year ago

I wasn't able to reproduce this problem. I tried running

import os
os.environ['CONTAINER'] = 'native'
os.environ['VERBOSE'] = '1'

from lean_dojo import *
os.environ['VERBOSE']
repo = LeanGitRepo("https://github.com/rah4927/lean-dojo-mew/", "12fec4d9f39ac9a41c6c1f058efab3ec41c028af")
theorem = Theorem(repo, "MiniF2F/Example.lean", "amc12a_2019_p21")

with Dojo(theorem) as (dojo, init_state):
    print(init_state)

, and got the following output:

2023-08-22 20:05:41.642 | WARNING  | lean_dojo.interaction.dojo:__init__:172 - Using Lean 4 without a hard timeout may hang indefinitely.
2023-08-22 20:05:41.643 | DEBUG    | lean_dojo.interaction.dojo:__enter__:192 - Initializing Dojo for Theorem(repo=LeanGitRepo(url='https://github.com/rah4927/lean-dojo-mew', commit='12fec4d9f39ac9a41c6c1f058efab3ec41c028af'), file_path=PosixPath('MiniF2F/Example.lean'), full_name='amc12a_2019_p21')
2023-08-22 20:05:41.826 | INFO     | lean_dojo.data_extraction.trace:get_traced_repo_path:131 - Tracing LeanGitRepo(url='https://github.com/rah4927/lean-dojo-mew', commit='12fec4d9f39ac9a41c6c1f058efab3ec41c028af')
2023-08-22 20:05:41.827 | DEBUG    | lean_dojo.data_extraction.trace:get_traced_repo_path:133 - Working in the temporary directory /raid/kaiyu/tmp/tmpiea_68xo
2023-08-22 20:05:42.341 | DEBUG    | lean_dojo.data_extraction.lean:clone_and_checkout:438 - Cloning LeanGitRepo(url='https://github.com/rah4927/lean-dojo-mew', commit='12fec4d9f39ac9a41c6c1f058efab3ec41c028af')
2023-08-22 20:05:42.734 | DEBUG    | lean_dojo.data_extraction.trace:_trace_lean4:97 - Tracing LeanGitRepo(url='https://github.com/rah4927/lean-dojo-mew', commit='12fec4d9f39ac9a41c6c1f058efab3ec41c028af')
2023-08-22 20:05:42.741 | DEBUG    | lean_dojo.container:run:183 - NUM_PROCS=32 python3 build_lean4_repo.py lean-dojo-mew
2023-08-22 20:05:43.011 | INFO     | __main__:main:105 - Building lean-dojo-mew
info: downloading component 'lean'
info: installing component 'lean'
info: cloning https://github.com/leanprover-community/mathlib4.git to ./lake-packages/mathlib
info: cloning https://github.com/leanprover/std4 to ./lake-packages/std
info: cloning https://github.com/gebner/quote4 to ./lake-packages/Qq
info: cloning https://github.com/JLimperg/aesop to ./lake-packages/aesop
info: cloning https://github.com/mhuisi/lean4-cli.git to ./lake-packages/Cli
info: cloning https://github.com/EdAyers/ProofWidgets4 to ./lake-packages/proofwidgets
Downloading proofwidgets/v0.0.13/linux-64.tar.gz
[0/21] Building Std.Lean.TagAttribute
[0/22] Building Std.Lean.Command
[0/23] Building Std.Tactic.Unreachable
[0/28] Building Std.Tactic.OpenPrivate
[0/31] Building Std.Tactic.ByCases
[0/32] Building Std.Tactic.SeqFocus
[0/32] Building Std.Lean.Position
[0/32] Building Std.Lean.Name
...
...
...
[1259/1263] Building MiniF2F.Minif2fImport
[1260/1263] Building MiniF2F.Example
stdout:
./././MiniF2F/Example.lean:11:8: warning: declaration uses 'sorry'
[1261/1263] Building MiniF2F.All
[1262/1263] Building MiniF2F
2023-08-22 20:21:23.224 | INFO     | __main__:main:138 - Tracing lean-dojo-mew
 95%|█████████▍| 1893/1993 [19:46<01:53,  1.13s/it]2023-08-22 20:42:00.791 | DEBUG    | lean_dojo.data_extraction.traced_data:from_traced_files:1388 - Parsing 1895 *.ast.json files in /raid/kaiyu/tmp/tmpiea_68xo/lean-dojo-mew with 31 workers
2023-08-22 20:42:02,776 INFO worker.py:1627 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
100%|██████████| 1895/1895 [03:47<00:00,  8.35it/s] 
2023-08-22 20:46:10.132 | DEBUG    | lean_dojo.data_extraction.traced_data:save_to_disk:1435 - Saving 1895 traced XML files to /raid/kaiyu/tmp/tmpiea_68xo/lean-dojo-mew with 31 workers
2023-08-22 20:46:13,480 INFO worker.py:1627 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
100%|██████████| 1895/1895 [01:03<00:00, 29.99it/s] 
2023-08-22 20:47:32.792 | DEBUG    | lean_dojo.interaction.dojo:_modify_file:398 - Modifying MiniF2F/Example.lean
2023-08-22 20:49:33.167 | DEBUG    | lean_dojo.interaction.dojo:__enter__:225 - Launching the proof using <class 'lean_dojo.container.NativeContainer'>
2023-08-22 20:49:35.995 | DEBUG    | lean_dojo.container:run:183 - lake build Lean4Repl
2023-08-22 20:49:45.404 | WARNING  | lean_dojo.container:run_interactive:209 - Disregarding `cpu_limit = 1 since NativeContainer does not support CPU limit.`
2023-08-22 20:49:45.404 | WARNING  | lean_dojo.container:run_interactive:213 - Disregarding `memory_limit = 16g` since NativeContainer does not support memory limit.
2023-08-22 20:49:48.956 | DEBUG    | lean_dojo.container:run_interactive:221 - lake env lean MiniF2F/Example.lean
2023-08-22 20:49:52.730 | DEBUG    | lean_dojo.interaction.dojo:_read_next_line:587 - REPL> {"tacticState": "z : ℂ\nh₀ : z = (1 + Complex.I) / ↑(Real.sqrt 2)\n⊢ (∑ k in Finset.Icc 1 12, z ^ ↑k ^ 2) * ∑ k in Finset.Icc 1 12, 1 / z ^ ↑k ^ 2 = 36", "sid": 0, "error": null}
2023-08-22 20:49:52.740 | DEBUG    | lean_dojo.interaction.dojo:_submit_request:547 - Request: exit
2023-08-22 20:49:52.815 | DEBUG    | lean_dojo.interaction.dojo:_read_next_line:587 - 
2023-08-22 20:49:52.815 | DEBUG    | lean_dojo.interaction.dojo:_cleanup:331 - Cleaning up.
2023-08-22 20:49:52.815 | DEBUG    | lean_dojo.interaction.dojo:_cleanup_container:341 - Cleaning up the container.
2023-08-22 20:49:56.664 | DEBUG    | lean_dojo.interaction.dojo:_cleanup_tmp_dir:354 - Cleaning up the temporary directory.
TacticState(pp='z : ℂ\nh₀ : z = (1 + Complex.I) / ↑(Real.sqrt 2)\n⊢ (∑ k in Finset.Icc 1 12, z ^ ↑k ^ 2) * ∑ k in Finset.Icc 1 12, 1 / z ^ ↑k ^ 2 = 36', id=0, message=None)

Can you try cleaning up everything and re-installing LeanDojo from the main branch:

rm -rf ~/.cache/lean_dojo && pip install git+https://github.com/lean-dojo/LeanDojo --upgrade --ignore-installed

Then try running the original script to trace the repo?

rah4927 commented 1 year ago

Ran those commands and then ran the original script. Threw same error :/ @yangky11

yangky11 commented 1 year ago

It's very difficult for me to locate the error since I cannot reproduce it.

Do you see the traced_repo = TracedRepo.from_traced_files(tmp_dir / repo.name) and Working in the temporary directory XXX in the log? I might be able to help if you can:

  1. Re-install LeanDojo locally using editable installation: git clone https://github.com/lean-dojo/LeanDojo && cd LeanDojo && pip install -e .
  2. Add import pdb; pdb.set_trace()before traced_repo = TracedRepo.from_traced_files(tmp_dir / repo.name) in src/lean_dojo/data_extraction/trace.py.
  3. Run the script with VERBOSE=1. When the breakpoint is triggered, zip the temporary directory (in the debug log) and send it to me (kaiyuy@caltech.edu). Note that the temporary directory will be deleted after the program exists (That's why we need to set the breakpoint).
rah4927 commented 1 year ago

@yangky11 Thanks! Just sent you a link to the zipped temp directory

yangky11 commented 1 year ago

I'll look into the problem when I get a chance. As a temporary workaround, you can use the repo traced on my machine. Just unzip this file into ~/.cache/lean_dojo/ (override what's already there, if any). Then everything should work as if the repo were traced on your machine.

BTW, why are you setting CONTAINER=native? Is Docker not an option for you? As mentioned in the docs, running with Docker is less likely to run into problems.

rah4927 commented 1 year ago

Hi @yangky11 thank you for the workaround, though this might not be the best since this is only a minimal example, and the original repo has hundreds of theorems (though the structure of the project is largely the same). Let me know once you get some time to take a look at the folder I sent.

As for docker, I am currently running it on a remote machine without root access, and I believe docker is generally meant to be used with root access. If you offer an alternative (such as Singularity), that could work. Thanks!

yangky11 commented 1 year ago

I'm almost sure it was an OOM error when tracing Mathlib/Data/Real/Basic.lean. I'm adding more checks so that it would report the error rather than just fail silently. But that wouldn't solve the memory issue. How much memory do you have?

rah4927 commented 1 year ago

@yangky11 you were right! that was the issue, increasing the memory limit of the job solved this problem, and the tracing completed without error. Thanks a lot! This issue can be closed now.