lean-dojo / LeanDojo

Tool for data extraction and interacting with Lean programmatically.
https://leandojo.org
MIT License
536 stars 80 forks source link

Add Support for Local and Remote Repositories #179

Closed RexWzh closed 1 month ago

RexWzh commented 2 months ago

Hi,

Thanks for the awesome project!

This PR enhances the LeanGitRepo class to support local and remote repositories, introducing a repo_type property (github, local, remote), simplifying the_to_commit_hash method, transitioning lean_version to a string for clarity, and adding tests for the new repository configurations.

A Quick Review

We transition from pygithub objects to gitpython for managing repositories.

For compatibility, a new propertyrepo_type is introduced in the LeanGitRepo class, supporting three values: github, local, and remote. I've adapted the APIs to accommodate these types:

Type LeanGitRepo.repo LeanGitRepo.url
github GitHub object GitHub URL
local Git object (local) Local repository path
remote Git object (local) Remote repository URL

It's important to note that while the GitHub object connects to an online repo, the Git object interacts directly with a local git directory.

Please let me know if any further adjustments are needed.

RexWzh commented 2 months ago

I’m planning to update the tracing method based on these settings next, just to be sure it aligns with the developers' preferences, and also, to avoid complicating the review process.

Looking forward to your feedback.

relate issue: https://github.com/lean-dojo/LeanDojo/issues/153

Iheadx commented 2 months ago

When I use the Local Repo in this code, It doesn't work as I expected. It seems that the dojo.py still need to be modified.

repo = LeanGitRepo("lean4-example", "7b6ecb9ad4829e4e73600a3329baeb3b5df8d23f")
theorem = Theorem(repo, "Lean4Example.lean", "hello_world")

with Dojo(theorem) as (dojo, init_state):
  print(init_state)

When using Dojo(theorem), an unknown URL type error will be reported.

RexWzh commented 2 months ago

It seems that the dojo.py still need to be modified

These are not implemented yet.

I’m planning to update the tracing method based on these settings next.

I have only modified the LeanGitRepo class to ensure backward compatibility when initializing the Lean repository. No changes have been made to other components.

RexWzh commented 2 months ago

The basic idea is to replace the GitHub object with a git object. Additionally, we need to change the tracing process, which currently retrieves information via the GitHub API, to a direct way. There are intricate details to adjust, and it may take some time to finalize everything. For now, I've marked it as a draft PR.

RexWzh commented 2 months ago

I am updating the tracing and interaction code by python debugging and accompanying tests.

For the sake of a smoother review process, it may be prudent to handle the integration in smaller, manageable segments. If it isn't too cumbersome, we can also review everything after all updates are made. I will periodically write tests to ensure compatibility and prevent the introduction of bugs.

RexWzh commented 2 months ago

Hi @Iheadx @darabos , please take a look at this. ;)

An example of the test:

repo = LeanGitRepo(url=LOCAL_REPO_PATH, commit="main")
assert repo.repo_type == 'local'
theorem = Theorem(repo, "Lean4Example.lean", "hello_world")
# initial state
dojo, state_0 = Dojo(theorem).__enter__()
assert state_0.pp == 'a b c : Nat\n⊢ a + b + c = a + c + b'
# state after applying a tactic
state_1 = dojo.run_tac(state_0, "rw [add_assoc]")
assert state_1.pp == 'a b c : Nat\n⊢ a + (b + c) = a + c + b'
# state after applying the 'sorry' tactic
assert dojo.run_tac(state_1, "sorry") == ProofGivenUp()
# concluding the proof
final_state = dojo.run_tac(state_1, "rw [add_comm b, ←add_assoc]")
assert isinstance(final_state, ProofFinished)

All test cases have passed on my machine, ubuntu22.

image

Network error for one case:

image
yangky11 commented 1 month ago

Thank you. This is really helpful! I'll be able to take a look later this week or in the weekend!

LuoKaiGSW commented 1 month ago

hi, @RexWzh Thanks for your contribution! repo = LeanGitRepo(url=LOCAL_REPO_PATH, commit="main") assert repo.repo_type == 'local' This means that I can pull the lean4-example repository to my local machine and modify the Lean files as needed without needing to push the changes back to the remote repository, right? The way to do this is by specifying LOCAL_REPO_PATH as the path to the local repository.

RexWzh commented 1 month ago

Yes @LuoKaiGSW , you can use a local Git repository by passing the path in the url parameter or using LeanGitRepo.from_path.

See test examples here.

import os
# Avoid using remote cache
os.environ['DISABLE_REMOTE_CACHE'] = 'true'
lean4_example_url = 'https://github.com/yangky11/lean4-example'
local_repo_path = "lean4-example"
clean_clone_and_checkout(lean4_example_url, local_repo_path, 'main')

local_repo = LeanGitRepo(url=local_repo_path, commit="main")
from_path_repo = LeanGitRepo.from_path(local_repo_path)

In general, you don't need to connect to GitHub at all when working with repositories of type local or remote.

However, if the lean version cannot be obtained from the toolchain file, it'll use the get_lean4_commit_from_config method, which interacts with GitHub:

config = self.get_config("lean-toolchain")
toolchain = config["content"]
m = _LEAN4_VERSION_REGEX.fullmatch(toolchain.strip())
if m is not None:
    lean_version = m["version"]
else:
    lean_version = get_lean4_version_from_config(toolchain)

A note on repository types:

The git.Repo object requires a path to the actual Git repository, whereas the github.Repository object only needs a GitHub URL. Thus, the url_to_repo function returns different objects based on the repo_type:


image

LuoKaiGSW commented 1 month ago

Yes @LuoKaiGSW , you can use a local Git repository by passing the path in the url parameter or using LeanGitRepo.from_path.

See test examples here.

import os
# Avoid using remote cache
os.environ['DISABLE_REMOTE_CACHE'] = 'true'
lean4_example_url = 'https://github.com/yangky11/lean4-example'
local_repo_path = "lean4-example"
clean_clone_and_checkout(lean4_example_url, local_repo_path, 'main')

local_repo = LeanGitRepo(url=local_repo_path, commit="main")
from_path_repo = LeanGitRepo.from_path(local_repo_path)

In general, you don't need to connect to GitHub at all when working with repositories of type local or remote.

However, if the lean version cannot be obtained from the toolchain file, it'll use the get_lean4_commit_from_config method, which interacts with GitHub:

config = self.get_config("lean-toolchain")
toolchain = config["content"]
m = _LEAN4_VERSION_REGEX.fullmatch(toolchain.strip())
if m is not None:
    lean_version = m["version"]
else:
    lean_version = get_lean4_version_from_config(toolchain)

A note on repository types:

The git.Repo object requires a path to the actual Git repository, whereas the github.Repository object only needs a GitHub URL. Thus, the url_to_repo function returns different objects based on the repo_type:

  • github: Returns a github.Repository object.
  • remote: Clones the Git repository to a temporary directory and returns a git.Repo object.
  • local: Copies the Git repository to a temporary directory and returns a git.Repo object.

image

hi, @RexWzh Thanks for your reply! I pulled the code from your repository and installed it using 'pip install .'. However, I encountered some errors during testing. Could you help me take a look at what the issue might be? code

from lean_dojo import *
repo = LeanGitRepo.from_path("/Users/luokai/Desktop/lk/project/lean_project/lean4-example")

error

2024-07-22 11:57:25.530 | INFO     | lean_dojo.data_extraction.trace:get_traced_repo_path:209 - Tracing LeanGitRepo(url='/Users/luokai/Desktop/lk/project/lean_project/lean4-example', commit='81b57776e7a7dd1270b62598750e9fddc2470127')
✔ [1/2] Built Lean4Example
Build completed successfully.
2024-07-22 11:57:31.562 | INFO     | lean_dojo.data_extraction.trace:get_traced_repo_path:209 - Tracing LeanGitRepo(url='/Users/luokai/Desktop/lk/project/lean_project/lean4-example', commit='81b57776e7a7dd1270b62598750e9fddc2470127')
✔ [1/2] Built Lean4Example
Build completed successfully.
Traceback (most recent call last):
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/runpy.py", line 289, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/luokai/Desktop/lk/project/lean_project/lean_py/main.py", line 16, in <module>
    traced_repo = trace(repo)
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 250, in trace
    cached_path = get_traced_repo_path(repo, build_deps)
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 212, in get_traced_repo_path
    _trace(repo, build_deps)
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 157, in _trace
    with launch_progressbar(dirs_to_monitor):
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 65, in launch_progressbar
    p.start()
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.
        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:
            if __name__ == '__main__':
                freeze_support()
                ...
        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

system info MacBook-Pro.local 23.1.0 Darwin Kernel Version 23.1.0

RexWzh commented 1 month ago

I've tested on both mac and Ubuntu and couldn't reproduce the issue you mentioned.

Please ensure that your cloned repository is up-to-date.

Mac:

❯ sw_vers
ProductName:            macOS
ProductVersion:         14.5
BuildVersion:           23F79
❯ ls
LICENSE        docs           lean4-example  pyproject.toml src
README.md      images         mypy.ini       scripts        tests
❯ python
Python 3.10.14 (main, May  6 2024, 14:47:20) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from lean_dojo import LeanGitRepo
>>> LeanGitRepo.from_path('lean4-example')
LeanGitRepo(url='/Users/wangzhihong/workspace/LEAN/LeanDojo/lean4-example', commit='7b6ecb9ad4829e4e73600a3329baeb3b5df8d23f')
>>> LeanGitRepo('lean4-example', 'main')
LeanGitRepo(url='/Users/wangzhihong/workspace/LEAN/LeanDojo/lean4-example', commit='7b6ecb9ad4829e4e73600a3329baeb3b5df8d23f')
>>> 

Ubuntu:

❯ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.4 LTS
Release:        22.04
Codename:       jammy
❯ python
Python 3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from lean_dojo import LeanGitRepo
>>> LeanGitRepo('lean4-example', 'main')
LeanGitRepo(url='/home/cuber/zhihong/LeanPorjects/lean4-example', commit='7b6ecb9ad4829e4e73600a3329baeb3b5df8d23f')
>>> LeanGitRepo.from_path('lean4-example')
LeanGitRepo(url='/home/cuber/zhihong/LeanPorjects/lean4-example', commit='7b6ecb9ad4829e4e73600a3329baeb3b5df8d23f')
LuoKaiGSW commented 1 month ago
workspace

hi, @RexWzh , I have fixed the previously mentioned issue, but encountered another problem. It seems to be caused by the absence of a .git file under the created temporary folders. How should I resolve this? code


import json
from pathlib import Path

from lean_dojo import * import os if name == "main": repo = LeanGitRepo(url="/Users/luokai/Desktop/lk/project/lean_project/lean4-example", commit="04eb31f5247303fa9c2509651cc73932228eb76c") assert repo.repo_type == 'local' traced_repo = trace(repo)

thm_datas = []
traced_tactics_data = []
for thm in traced_repo.get_traced_theorems():
    if not thm.repo.is_lean4:
        thm_datas.append(thm)
        traced_tactics = thm.get_traced_tactics()
        traced_tactics_data.append(traced_tactics)
The path of the above code is lean4-example/scripts/test.py, and the files under the lean4-example directory are as follows:
<img width="468" alt="image" src="https://github.com/user-attachments/assets/dedb0edc-c64c-438f-bbb1-6752ce415c9b">
**error**

2024-07-23 17:55:58.616 | INFO | lean_dojo.data_extraction.trace:get_traced_repo_path:209 - Tracing LeanGitRepo(url='/Users/luokai/Desktop/lk/project/lean_project/lean4-example', commit='04eb31f5247303fa9c2509651cc73932228eb76c') ✔ [1/2] Built Lean4Example Build completed successfully. 98%|█████████▊| 891/907 [10:20<00:10, 1.55it/s]✔ [1/2] Built Lean4Repl Build completed successfully. 2024-07-23 18:06:30.578 | INFO | lean_dojo.utils:execute:116 - 2024-07-23 18:06:30.579 | ERROR | lean_dojo.utils:execute:117 - 致命错误:不是 git 仓库(或者任何父目录):.git Traceback (most recent call last): File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/contextlib.py", line 153, in exit self.gen.throw(typ, value, traceback) File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 52, in working_directory yield path File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 213, in get_traced_repo_path traced_repo = TracedRepo.from_traced_files(tmp_dir / repo.name, build_deps) File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/traced_data.py", line 1094, in from_traced_files repo = LeanGitRepo.from_path(root_dir) File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/lean.py", line 495, in from_path url, commit = get_repo_info(path) File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 186, in get_repo_info commitmsg, = execute(f"git log -n 1", capture_output=True) File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 118, in execute raise ex File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 113, in execute res = subprocess.run(cmd, shell=True, capture_output=capture_output, check=True) File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'git log -n 1' returned non-zero exit status 128. python-BaseException

RexWzh commented 1 month ago

It seems that the issue was introduced by the latest few commits. Since pytest takes too much time, the recent commits have not been tested.

In practice, reverting to the last pytest-tested commit works correctly.

git reset --hard 95714323227f79d7927b4081b1e8fb15ef864d14

I'll break this PR into smaller PRs to ensure stability at each stage and make code reviews easier.

image
yangky11 commented 1 month ago

I encounter the following error when testing this PR.

Starting from an empty cache, I ran:

from lean_dojo import *
repo = LeanGitRepo("https://github.com/leanprover-community/mathlib4", "29dcec074de168ac2bf835a77ef68bbe069194c5")
trace(repo)

It successfully traced the repo but saved the traced repo to ~/.cache/lean_dojo/tmp83kwc9x5-mathlib4-29dcec074de168ac2bf835a77ef68bbe069194c5 instead of ~/.cache/lean_dojo/leanprover-community-mathlib4-29dcec074de168ac2bf835a77ef68bbe069194c5.

Then, I exited Python, started a new session, and ran the above code again. Instead of re-using the traced repo in the cache, it started to trace the repo again.

yangky11 commented 1 month ago

I tried tracing https://github.com/yangky11/lean4-example and had a similar issue.

image
yangky11 commented 1 month ago

For comparison, this is using LeanDojo's current main branch:

image
RexWzh commented 1 month ago

Thanks for pointing that out. I noticed the issue, and it is fixed in the commit use tempdir for local type & resolve cache problem.

# Remove cache directory
import os
os.environ['NUM_PROCS'] = '128'
os.environ['DISABLE_REMOTE_CACHE'] = 'true'
from lean_dojo import LeanGitRepo, trace, Dojo, Theorem

repo = LeanGitRepo('/home/cuber/zhihong/LeanPorjects/leandojo/lean4-example', 'main')
trace_repo = trace(repo)

The cache file is located at:

❯ ls ~/.cache/lean_dojo
leandojo-lean4-example-7b6ecb9ad4829e4e73600a3329baeb3b5df8d23f

However, I did not run tests for this commit and later ones.

I believe it would be more effective to split these changes into smaller PRs.

188

LuoKaiGSW commented 1 month ago

I'll break this PR into smaller PRs to ensure stability at each stage and make code reviews easier.

I would like to know which branch can be used to load the local git repository. My current branch situation is as follows, and I am still encountering the same error, Thank you!

2024-07-24 07:06:03.269 | WARNING  | lean_dojo:<module>:33 - Running LeanDojo as the root user may cause unexpected issues. Proceed with caution.
2024-07-24 07:06:03.285 | INFO     | lean_dojo.data_extraction.trace:get_traced_repo_path:209 - Tracing LeanGitRepo(url='/home/lean4-example', commit='3d3a1dd2cdaf9251de7e0575a85c702e7086db60')
Build completed successfully.
 19%|████████████████████████████████████████████▌                                                                                                                                                                                23%|████████████████████████████████████████████████████▉                                                                                                                                                                        25%|█████████████████████████████████████████████████████████▊                                                                                                                                                                   27%|███████████████████████████████████████████████████████████████▍                                                                                                                                                            100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋| 906/907 [03:20<00:00,  2.32it/s]
✖ [1/2] Building Lean4Repl
trace: .> LEAN_PATH=././.lake/build/lib LD_LIBRARY_PATH=././.lake/build/lib /root/.elan/toolchains/leanprover--lean4---v4.9.0-rc3/bin/lean ././././Lean4Repl.lean -R ./././. -o ././.lake/build/lib/Lean4Repl.olean -i ././.lake/build/lib/Lean4Repl.ilean -c ././.lake/build/ir/Lean4Repl.c --json
info: stderr:
resource exhausted (error code: 28, no space left on device)
error: Lean exited with code 1
Some builds logged failures:
- Lean4Repl
error: build failed
2024-07-24 07:09:41.940 | WARNING  | lean_dojo.data_extraction.trace:_trace:185 - Failed to build Lean4Repl. You may run into issues when interacting with the repo.
2024-07-24 07:09:41.946 | INFO     | lean_dojo.utils:execute:116 - 
2024-07-24 07:09:41.946 | ERROR    | lean_dojo.utils:execute:117 - fatal: not a git repository (or any of the parent directories): .git

Traceback (most recent call last):
  File "/home/lean4-example/scripts/test.py", line 19, in <module>
    traced_repo = trace(repo)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 248, in trace
    cached_path = get_traced_repo_path(repo, build_deps)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 213, in get_traced_repo_path
    traced_repo = TracedRepo.from_traced_files(tmp_dir / repo.name, build_deps)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/traced_data.py", line 1094, in from_traced_files
    repo = LeanGitRepo.from_path(root_dir)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/lean.py", line 494, in from_path
    url, commit = get_repo_info(path)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 186, in get_repo_info
    commit_msg, _ = execute(f"git log -n 1", capture_output=True)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 118, in execute
    raise ex
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 113, in execute
    res = subprocess.run(cmd, shell=True, capture_output=capture_output, check=True)
  File "/root/miniconda3/envs/lean/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'git log -n 1' returned non-zero exit status 128.

image

LuoKaiGSW commented 1 month ago

I'll break this PR into smaller PRs to ensure stability at each stage and make code reviews easier.

I would like to know which branch can be used to load the local git repository. My current branch situation is as follows, and I am still encountering the same error, Thank you!

2024-07-24 07:06:03.269 | WARNING  | lean_dojo:<module>:33 - Running LeanDojo as the root user may cause unexpected issues. Proceed with caution.
2024-07-24 07:06:03.285 | INFO     | lean_dojo.data_extraction.trace:get_traced_repo_path:209 - Tracing LeanGitRepo(url='/home/lean4-example', commit='3d3a1dd2cdaf9251de7e0575a85c702e7086db60')
Build completed successfully.
 19%|████████████████████████████████████████████▌                                                                                                                                                                                23%|████████████████████████████████████████████████████▉                                                                                                                                                                        25%|█████████████████████████████████████████████████████████▊                                                                                                                                                                   27%|███████████████████████████████████████████████████████████████▍                                                                                                                                                            100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋| 906/907 [03:20<00:00,  2.32it/s]
✖ [1/2] Building Lean4Repl
trace: .> LEAN_PATH=././.lake/build/lib LD_LIBRARY_PATH=././.lake/build/lib /root/.elan/toolchains/leanprover--lean4---v4.9.0-rc3/bin/lean ././././Lean4Repl.lean -R ./././. -o ././.lake/build/lib/Lean4Repl.olean -i ././.lake/build/lib/Lean4Repl.ilean -c ././.lake/build/ir/Lean4Repl.c --json
info: stderr:
resource exhausted (error code: 28, no space left on device)
error: Lean exited with code 1
Some builds logged failures:
- Lean4Repl
error: build failed
2024-07-24 07:09:41.940 | WARNING  | lean_dojo.data_extraction.trace:_trace:185 - Failed to build Lean4Repl. You may run into issues when interacting with the repo.
2024-07-24 07:09:41.946 | INFO     | lean_dojo.utils:execute:116 - 
2024-07-24 07:09:41.946 | ERROR    | lean_dojo.utils:execute:117 - fatal: not a git repository (or any of the parent directories): .git

Traceback (most recent call last):
  File "/home/lean4-example/scripts/test.py", line 19, in <module>
    traced_repo = trace(repo)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 248, in trace
    cached_path = get_traced_repo_path(repo, build_deps)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 213, in get_traced_repo_path
    traced_repo = TracedRepo.from_traced_files(tmp_dir / repo.name, build_deps)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/traced_data.py", line 1094, in from_traced_files
    repo = LeanGitRepo.from_path(root_dir)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/lean.py", line 494, in from_path
    url, commit = get_repo_info(path)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 186, in get_repo_info
    commit_msg, _ = execute(f"git log -n 1", capture_output=True)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 118, in execute
    raise ex
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 113, in execute
    res = subprocess.run(cmd, shell=True, capture_output=capture_output, check=True)
  File "/root/miniconda3/envs/lean/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'git log -n 1' returned non-zero exit status 128.

image

Thanks a lot @RexWzh. It works and I can trace the local repo! I've decided to temporarily use this version of the code. git reset --hard 95714323227f79d7927b4081b1e8fb15ef864d14

RexWzh commented 1 month ago

@LuoKaiGSW Thanks for assisting with testing the commit.

Please note that this branch is currently under development and might be unstable until all the code has been thoroughly reviewed.

I'll fix that in later commits.

yangky11 commented 1 month ago

Hi @RexWzh, is this PR ready for further review and testing?

RexWzh commented 1 month ago

Yes, tests are underway and most of the repositories pass successfully, except for mathlib4, which fails at times due to network issues.

yangky11 commented 1 month ago

Got this when running one of the tests:

============================= test session starts ==============================
platform linux -- Python 3.11.9, pytest-8.3.2, pluggy-1.5.0
rootdir: /private/home/kaiyuy/LeanDojo
configfile: pyproject.toml
plugins: anyio-4.4.0
collected 48 items

tests/data_extraction/test_cache.py F

=================================== FAILURES ===================================
_______________________________ test_repo_cache ________________________________

lean4_example_url = 'https://github.com/yangky11/lean4-example'
remote_example_url = 'https://gitee.com/rexzong/lean4-example'
example_commit_hash = '3f8c5eb303a225cdef609498b8d87262e5ef344b'

    def test_repo_cache(lean4_example_url, remote_example_url, example_commit_hash):
        # Note: The `git.Repo` requires the local repo to be cloned in a directory
        # all cached repos are stored in CACHE_DIR/repos
        prefix = "repos"
        repo_name = "lean4-example"

        # test local repo cache
        with working_directory() as tmp_dir:
            repo = Repo.clone_from(lean4_example_url, repo_name)
            repo.git.checkout(example_commit_hash)
            local_dir = tmp_dir / repo_name
            rel_cache_dir = (
                prefix / Path(f"gitpython-{repo_name}-{example_commit_hash}") / repo_name
            )
            cache.store(local_dir, rel_cache_dir)
        # get the cache
        repo_cache_dir = cache.get(rel_cache_dir)
        assert repo_cache_dir is not None

        # test remote repo cache
        with working_directory() as tmp_dir:
>           repo = Repo.clone_from(remote_example_url, repo_name)

tests/data_extraction/test_cache.py:29: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../miniconda3/envs/lean-dojo/lib/python3.11/site-packages/git/repo/base.py:1525: in clone_from
    return cls._clone(
../miniconda3/envs/lean-dojo/lib/python3.11/site-packages/git/repo/base.py:1396: in _clone
    finalize_process(proc, stderr=stderr)
../miniconda3/envs/lean-dojo/lib/python3.11/site-packages/git/util.py:504: in finalize_process
    proc.wait(**kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <git.cmd.Git.AutoInterrupt object at 0x7fa7e24b3fc0>
stderr = "Cloning into 'lean4-example'...\nfatal: unable to access 'https://gitee.com/rexzong/lean4-example/': The requested URL returned error: 502\n"

    def wait(self, stderr: Union[None, str, bytes] = b"") -> int:
        """Wait for the process and return its status code.

        :param stderr:
            Previously read value of stderr, in case stderr is already closed.

        :warn:
            May deadlock if output or error pipes are used and not handled
            separately.

        :raise git.exc.GitCommandError:
            If the return status is not 0.
        """
        if stderr is None:
            stderr_b = b""
        stderr_b = force_bytes(data=stderr, encoding="utf-8")
        status: Union[int, None]
        if self.proc is not None:
            status = self.proc.wait()
            p_stderr = self.proc.stderr
        else:  # Assume the underlying proc was killed earlier or never existed.
            status = self.status
            p_stderr = None

        def read_all_from_possibly_closed_stream(stream: Union[IO[bytes], None]) -> bytes:
            if stream:
                try:
                    return stderr_b + force_bytes(stream.read())
                except (OSError, ValueError):
                    return stderr_b or b""
            else:
                return stderr_b or b""

        # END status handling

        if status != 0:
            errstr = read_all_from_possibly_closed_stream(p_stderr)
            _logger.debug("AutoInterrupt wait stderr: %r" % (errstr,))
>           raise GitCommandError(remove_password_if_present(self.args), status, errstr)
E           git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
E             cmdline: git clone -v -- https://gitee.com/rexzong/lean4-example lean4-example
E             stderr: 'Cloning into 'lean4-example'...
E           fatal: unable to access 'https://gitee.com/rexzong/lean4-example/': The requested URL returned error: 502
E           '

../miniconda3/envs/lean-dojo/lib/python3.11/site-packages/git/cmd.py:834: GitCommandError
=============================== warnings summary ===============================
../miniconda3/envs/lean-dojo/lib/python3.11/site-packages/jupyter_client/connect.py:22
  /private/home/kaiyuy/miniconda3/envs/lean-dojo/lib/python3.11/site-packages/jupyter_client/connect.py:22: DeprecationWarning: Jupyter is migrating its paths to use standard platformdirs
  given by the platformdirs library.  To remove this warning and
  see the appropriate new directories, set the environment variable
  `JUPYTER_PLATFORM_DIRS=1` and then run `jupyter --paths`.
  The use of platformdirs will be the default in `jupyter_core` v6
    from jupyter_core.paths import jupyter_data_dir, jupyter_runtime_dir, secure_write

src/lean_dojo/data_extraction/trace.py:97
  /private/home/kaiyuy/LeanDojo/src/lean_dojo/data_extraction/trace.py:97: DeprecationWarning: invalid escape sequence '\*'
    """Check if all \*.lean files have been processed to produce \*.ast.json and \*.dep_paths files."""

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/data_extraction/test_cache.py::test_repo_cache - git.exc.GitComm...
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
======================== 1 failed, 2 warnings in 20.04s ========================
yangky11 commented 1 month ago

I cannot reproduce the above error reliably (probably a network issue). Running other tests now...

yangky11 commented 1 month ago

All other tests were successful. I merged it into dev and will merge into main after some minor cleaning. Thank you so much for this valuable contribution.

yangky11 commented 1 month ago

Merged into main and released in v2.1.0. Thanks everyone!