Closed RexWzh closed 1 month ago
I’m planning to update the tracing method based on these settings next, just to be sure it aligns with the developers' preferences, and also, to avoid complicating the review process.
Looking forward to your feedback.
relate issue: https://github.com/lean-dojo/LeanDojo/issues/153
When I use the Local Repo in this code, It doesn't work as I expected. It seems that the dojo.py
still need to be modified.
repo = LeanGitRepo("lean4-example", "7b6ecb9ad4829e4e73600a3329baeb3b5df8d23f")
theorem = Theorem(repo, "Lean4Example.lean", "hello_world")
with Dojo(theorem) as (dojo, init_state):
print(init_state)
When using Dojo(theorem)
, an unknown URL type
error will be reported.
It seems that the
dojo.py
still need to be modified
These are not implemented yet.
I’m planning to update the tracing method based on these settings next.
I have only modified the LeanGitRepo
class to ensure backward compatibility when initializing the Lean repository. No changes have been made to other components.
The basic idea is to replace the GitHub object with a git object. Additionally, we need to change the tracing process, which currently retrieves information via the GitHub API, to a direct way. There are intricate details to adjust, and it may take some time to finalize everything. For now, I've marked it as a draft PR.
I am updating the tracing and interaction code by python debugging and accompanying tests.
For the sake of a smoother review process, it may be prudent to handle the integration in smaller, manageable segments. If it isn't too cumbersome, we can also review everything after all updates are made. I will periodically write tests to ensure compatibility and prevent the introduction of bugs.
Hi @Iheadx @darabos , please take a look at this. ;)
An example of the test:
repo = LeanGitRepo(url=LOCAL_REPO_PATH, commit="main")
assert repo.repo_type == 'local'
theorem = Theorem(repo, "Lean4Example.lean", "hello_world")
# initial state
dojo, state_0 = Dojo(theorem).__enter__()
assert state_0.pp == 'a b c : Nat\n⊢ a + b + c = a + c + b'
# state after applying a tactic
state_1 = dojo.run_tac(state_0, "rw [add_assoc]")
assert state_1.pp == 'a b c : Nat\n⊢ a + (b + c) = a + c + b'
# state after applying the 'sorry' tactic
assert dojo.run_tac(state_1, "sorry") == ProofGivenUp()
# concluding the proof
final_state = dojo.run_tac(state_1, "rw [add_comm b, ←add_assoc]")
assert isinstance(final_state, ProofFinished)
All test cases have passed on my machine, ubuntu22.
Network error for one case:
Thank you. This is really helpful! I'll be able to take a look later this week or in the weekend!
hi, @RexWzh Thanks for your contribution!
repo = LeanGitRepo(url=LOCAL_REPO_PATH, commit="main") assert repo.repo_type == 'local'
This means that I can pull the lean4-example repository to my local machine and modify the Lean files as needed without needing to push the changes back to the remote repository, right? The way to do this is by specifying LOCAL_REPO_PATH as the path to the local repository.
Yes @LuoKaiGSW , you can use a local Git repository by passing the path in the url
parameter or using LeanGitRepo.from_path
.
See test examples here.
import os
# Avoid using remote cache
os.environ['DISABLE_REMOTE_CACHE'] = 'true'
lean4_example_url = 'https://github.com/yangky11/lean4-example'
local_repo_path = "lean4-example"
clean_clone_and_checkout(lean4_example_url, local_repo_path, 'main')
local_repo = LeanGitRepo(url=local_repo_path, commit="main")
from_path_repo = LeanGitRepo.from_path(local_repo_path)
In general, you don't need to connect to GitHub at all when working with repositories of type local
or remote
.
However, if the lean version cannot be obtained from the toolchain
file, it'll use the get_lean4_commit_from_config
method, which interacts with GitHub:
config = self.get_config("lean-toolchain")
toolchain = config["content"]
m = _LEAN4_VERSION_REGEX.fullmatch(toolchain.strip())
if m is not None:
lean_version = m["version"]
else:
lean_version = get_lean4_version_from_config(toolchain)
A note on repository types:
The git.Repo
object requires a path to the actual Git repository, whereas the github.Repository
object only needs a GitHub URL. Thus, the url_to_repo
function returns different objects based on the repo_type
:
github.Repository
object.git.Repo
object.git.Repo
object.Yes @LuoKaiGSW , you can use a local Git repository by passing the path in the
url
parameter or usingLeanGitRepo.from_path
.See test examples here.
import os # Avoid using remote cache os.environ['DISABLE_REMOTE_CACHE'] = 'true' lean4_example_url = 'https://github.com/yangky11/lean4-example' local_repo_path = "lean4-example" clean_clone_and_checkout(lean4_example_url, local_repo_path, 'main') local_repo = LeanGitRepo(url=local_repo_path, commit="main") from_path_repo = LeanGitRepo.from_path(local_repo_path)
In general, you don't need to connect to GitHub at all when working with repositories of type
local
orremote
.However, if the lean version cannot be obtained from the
toolchain
file, it'll use theget_lean4_commit_from_config
method, which interacts with GitHub:config = self.get_config("lean-toolchain") toolchain = config["content"] m = _LEAN4_VERSION_REGEX.fullmatch(toolchain.strip()) if m is not None: lean_version = m["version"] else: lean_version = get_lean4_version_from_config(toolchain)
A note on repository types:
The
git.Repo
object requires a path to the actual Git repository, whereas thegithub.Repository
object only needs a GitHub URL. Thus, theurl_to_repo
function returns different objects based on therepo_type
:
- github: Returns a
github.Repository
object.- remote: Clones the Git repository to a temporary directory and returns a
git.Repo
object.- local: Copies the Git repository to a temporary directory and returns a
git.Repo
object.
hi, @RexWzh Thanks for your reply! I pulled the code from your repository and installed it using 'pip install .'. However, I encountered some errors during testing. Could you help me take a look at what the issue might be? code
from lean_dojo import *
repo = LeanGitRepo.from_path("/Users/luokai/Desktop/lk/project/lean_project/lean4-example")
error
2024-07-22 11:57:25.530 | INFO | lean_dojo.data_extraction.trace:get_traced_repo_path:209 - Tracing LeanGitRepo(url='/Users/luokai/Desktop/lk/project/lean_project/lean4-example', commit='81b57776e7a7dd1270b62598750e9fddc2470127')
✔ [1/2] Built Lean4Example
Build completed successfully.
2024-07-22 11:57:31.562 | INFO | lean_dojo.data_extraction.trace:get_traced_repo_path:209 - Tracing LeanGitRepo(url='/Users/luokai/Desktop/lk/project/lean_project/lean4-example', commit='81b57776e7a7dd1270b62598750e9fddc2470127')
✔ [1/2] Built Lean4Example
Build completed successfully.
Traceback (most recent call last):
File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/runpy.py", line 289, in run_path
return _run_module_code(code, init_globals, run_name,
File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/runpy.py", line 96, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Users/luokai/Desktop/lk/project/lean_project/lean_py/main.py", line 16, in <module>
traced_repo = trace(repo)
File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 250, in trace
cached_path = get_traced_repo_path(repo, build_deps)
File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 212, in get_traced_repo_path
_trace(repo, build_deps)
File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 157, in _trace
with launch_progressbar(dirs_to_monitor):
File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 65, in launch_progressbar
p.start()
File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
return Popen(process_obj)
File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
system info MacBook-Pro.local 23.1.0 Darwin Kernel Version 23.1.0
I've tested on both mac and Ubuntu and couldn't reproduce the issue you mentioned.
Please ensure that your cloned repository is up-to-date.
Mac:
❯ sw_vers
ProductName: macOS
ProductVersion: 14.5
BuildVersion: 23F79
❯ ls
LICENSE docs lean4-example pyproject.toml src
README.md images mypy.ini scripts tests
❯ python
Python 3.10.14 (main, May 6 2024, 14:47:20) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from lean_dojo import LeanGitRepo
>>> LeanGitRepo.from_path('lean4-example')
LeanGitRepo(url='/Users/wangzhihong/workspace/LEAN/LeanDojo/lean4-example', commit='7b6ecb9ad4829e4e73600a3329baeb3b5df8d23f')
>>> LeanGitRepo('lean4-example', 'main')
LeanGitRepo(url='/Users/wangzhihong/workspace/LEAN/LeanDojo/lean4-example', commit='7b6ecb9ad4829e4e73600a3329baeb3b5df8d23f')
>>>
Ubuntu:
❯ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy
❯ python
Python 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from lean_dojo import LeanGitRepo
>>> LeanGitRepo('lean4-example', 'main')
LeanGitRepo(url='/home/cuber/zhihong/LeanPorjects/lean4-example', commit='7b6ecb9ad4829e4e73600a3329baeb3b5df8d23f')
>>> LeanGitRepo.from_path('lean4-example')
LeanGitRepo(url='/home/cuber/zhihong/LeanPorjects/lean4-example', commit='7b6ecb9ad4829e4e73600a3329baeb3b5df8d23f')
workspace
hi, @RexWzh , I have fixed the previously mentioned issue, but encountered another problem. It seems to be caused by the absence of a .git file under the created temporary folders. How should I resolve this? code
import json from pathlib import Path
from lean_dojo import * import os if name == "main": repo = LeanGitRepo(url="/Users/luokai/Desktop/lk/project/lean_project/lean4-example", commit="04eb31f5247303fa9c2509651cc73932228eb76c") assert repo.repo_type == 'local' traced_repo = trace(repo)
thm_datas = []
traced_tactics_data = []
for thm in traced_repo.get_traced_theorems():
if not thm.repo.is_lean4:
thm_datas.append(thm)
traced_tactics = thm.get_traced_tactics()
traced_tactics_data.append(traced_tactics)
The path of the above code is lean4-example/scripts/test.py, and the files under the lean4-example directory are as follows:
<img width="468" alt="image" src="https://github.com/user-attachments/assets/dedb0edc-c64c-438f-bbb1-6752ce415c9b">
**error**
2024-07-23 17:55:58.616 | INFO | lean_dojo.data_extraction.trace:get_traced_repo_path:209 - Tracing LeanGitRepo(url='/Users/luokai/Desktop/lk/project/lean_project/lean4-example', commit='04eb31f5247303fa9c2509651cc73932228eb76c') ✔ [1/2] Built Lean4Example Build completed successfully. 98%|█████████▊| 891/907 [10:20<00:10, 1.55it/s]✔ [1/2] Built Lean4Repl Build completed successfully. 2024-07-23 18:06:30.578 | INFO | lean_dojo.utils:execute:116 - 2024-07-23 18:06:30.579 | ERROR | lean_dojo.utils:execute:117 - 致命错误:不是 git 仓库(或者任何父目录):.git Traceback (most recent call last): File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/contextlib.py", line 153, in exit self.gen.throw(typ, value, traceback) File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 52, in working_directory yield path File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 213, in get_traced_repo_path traced_repo = TracedRepo.from_traced_files(tmp_dir / repo.name, build_deps) File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/traced_data.py", line 1094, in from_traced_files repo = LeanGitRepo.from_path(root_dir) File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/lean.py", line 495, in from_path url, commit = get_repo_info(path) File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 186, in get_repo_info commitmsg, = execute(f"git log -n 1", capture_output=True) File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 118, in execute raise ex File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 113, in execute res = subprocess.run(cmd, shell=True, capture_output=capture_output, check=True) File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'git log -n 1' returned non-zero exit status 128. python-BaseException
It seems that the issue was introduced by the latest few commits. Since pytest
takes too much time, the recent commits have not been tested.
In practice, reverting to the last pytest-tested commit works correctly.
git reset --hard 95714323227f79d7927b4081b1e8fb15ef864d14
I'll break this PR into smaller PRs to ensure stability at each stage and make code reviews easier.
I encounter the following error when testing this PR.
Starting from an empty cache, I ran:
from lean_dojo import *
repo = LeanGitRepo("https://github.com/leanprover-community/mathlib4", "29dcec074de168ac2bf835a77ef68bbe069194c5")
trace(repo)
It successfully traced the repo but saved the traced repo to ~/.cache/lean_dojo/tmp83kwc9x5-mathlib4-29dcec074de168ac2bf835a77ef68bbe069194c5
instead of ~/.cache/lean_dojo/leanprover-community-mathlib4-29dcec074de168ac2bf835a77ef68bbe069194c5
.
Then, I exited Python, started a new session, and ran the above code again. Instead of re-using the traced repo in the cache, it started to trace the repo again.
I tried tracing https://github.com/yangky11/lean4-example and had a similar issue.
For comparison, this is using LeanDojo's current main
branch:
Thanks for pointing that out. I noticed the issue, and it is fixed in the commit use tempdir for local type & resolve cache problem.
# Remove cache directory
import os
os.environ['NUM_PROCS'] = '128'
os.environ['DISABLE_REMOTE_CACHE'] = 'true'
from lean_dojo import LeanGitRepo, trace, Dojo, Theorem
repo = LeanGitRepo('/home/cuber/zhihong/LeanPorjects/leandojo/lean4-example', 'main')
trace_repo = trace(repo)
The cache file is located at:
❯ ls ~/.cache/lean_dojo
leandojo-lean4-example-7b6ecb9ad4829e4e73600a3329baeb3b5df8d23f
However, I did not run tests for this commit and later ones.
I believe it would be more effective to split these changes into smaller PRs.
I'll break this PR into smaller PRs to ensure stability at each stage and make code reviews easier.
I would like to know which branch can be used to load the local git repository. My current branch situation is as follows, and I am still encountering the same error, Thank you!
2024-07-24 07:06:03.269 | WARNING | lean_dojo:<module>:33 - Running LeanDojo as the root user may cause unexpected issues. Proceed with caution.
2024-07-24 07:06:03.285 | INFO | lean_dojo.data_extraction.trace:get_traced_repo_path:209 - Tracing LeanGitRepo(url='/home/lean4-example', commit='3d3a1dd2cdaf9251de7e0575a85c702e7086db60')
Build completed successfully.
19%|████████████████████████████████████████████▌ 23%|████████████████████████████████████████████████████▉ 25%|█████████████████████████████████████████████████████████▊ 27%|███████████████████████████████████████████████████████████████▍ 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋| 906/907 [03:20<00:00, 2.32it/s]
✖ [1/2] Building Lean4Repl
trace: .> LEAN_PATH=././.lake/build/lib LD_LIBRARY_PATH=././.lake/build/lib /root/.elan/toolchains/leanprover--lean4---v4.9.0-rc3/bin/lean ././././Lean4Repl.lean -R ./././. -o ././.lake/build/lib/Lean4Repl.olean -i ././.lake/build/lib/Lean4Repl.ilean -c ././.lake/build/ir/Lean4Repl.c --json
info: stderr:
resource exhausted (error code: 28, no space left on device)
error: Lean exited with code 1
Some builds logged failures:
- Lean4Repl
error: build failed
2024-07-24 07:09:41.940 | WARNING | lean_dojo.data_extraction.trace:_trace:185 - Failed to build Lean4Repl. You may run into issues when interacting with the repo.
2024-07-24 07:09:41.946 | INFO | lean_dojo.utils:execute:116 -
2024-07-24 07:09:41.946 | ERROR | lean_dojo.utils:execute:117 - fatal: not a git repository (or any of the parent directories): .git
Traceback (most recent call last):
File "/home/lean4-example/scripts/test.py", line 19, in <module>
traced_repo = trace(repo)
File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 248, in trace
cached_path = get_traced_repo_path(repo, build_deps)
File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 213, in get_traced_repo_path
traced_repo = TracedRepo.from_traced_files(tmp_dir / repo.name, build_deps)
File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/traced_data.py", line 1094, in from_traced_files
repo = LeanGitRepo.from_path(root_dir)
File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/lean.py", line 494, in from_path
url, commit = get_repo_info(path)
File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 186, in get_repo_info
commit_msg, _ = execute(f"git log -n 1", capture_output=True)
File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 118, in execute
raise ex
File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 113, in execute
res = subprocess.run(cmd, shell=True, capture_output=capture_output, check=True)
File "/root/miniconda3/envs/lean/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'git log -n 1' returned non-zero exit status 128.
I'll break this PR into smaller PRs to ensure stability at each stage and make code reviews easier.
I would like to know which branch can be used to load the local git repository. My current branch situation is as follows, and I am still encountering the same error, Thank you!
2024-07-24 07:06:03.269 | WARNING | lean_dojo:<module>:33 - Running LeanDojo as the root user may cause unexpected issues. Proceed with caution. 2024-07-24 07:06:03.285 | INFO | lean_dojo.data_extraction.trace:get_traced_repo_path:209 - Tracing LeanGitRepo(url='/home/lean4-example', commit='3d3a1dd2cdaf9251de7e0575a85c702e7086db60') Build completed successfully. 19%|████████████████████████████████████████████▌ 23%|████████████████████████████████████████████████████▉ 25%|█████████████████████████████████████████████████████████▊ 27%|███████████████████████████████████████████████████████████████▍ 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋| 906/907 [03:20<00:00, 2.32it/s] ✖ [1/2] Building Lean4Repl trace: .> LEAN_PATH=././.lake/build/lib LD_LIBRARY_PATH=././.lake/build/lib /root/.elan/toolchains/leanprover--lean4---v4.9.0-rc3/bin/lean ././././Lean4Repl.lean -R ./././. -o ././.lake/build/lib/Lean4Repl.olean -i ././.lake/build/lib/Lean4Repl.ilean -c ././.lake/build/ir/Lean4Repl.c --json info: stderr: resource exhausted (error code: 28, no space left on device) error: Lean exited with code 1 Some builds logged failures: - Lean4Repl error: build failed 2024-07-24 07:09:41.940 | WARNING | lean_dojo.data_extraction.trace:_trace:185 - Failed to build Lean4Repl. You may run into issues when interacting with the repo. 2024-07-24 07:09:41.946 | INFO | lean_dojo.utils:execute:116 - 2024-07-24 07:09:41.946 | ERROR | lean_dojo.utils:execute:117 - fatal: not a git repository (or any of the parent directories): .git Traceback (most recent call last): File "/home/lean4-example/scripts/test.py", line 19, in <module> traced_repo = trace(repo) File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 248, in trace cached_path = get_traced_repo_path(repo, build_deps) File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 213, in get_traced_repo_path traced_repo = TracedRepo.from_traced_files(tmp_dir / repo.name, build_deps) File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/traced_data.py", line 1094, in from_traced_files repo = LeanGitRepo.from_path(root_dir) File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/lean.py", line 494, in from_path url, commit = get_repo_info(path) File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 186, in get_repo_info commit_msg, _ = execute(f"git log -n 1", capture_output=True) File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 118, in execute raise ex File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 113, in execute res = subprocess.run(cmd, shell=True, capture_output=capture_output, check=True) File "/root/miniconda3/envs/lean/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'git log -n 1' returned non-zero exit status 128.
Thanks a lot @RexWzh. It works and I can trace the local repo! I've decided to temporarily use this version of the code.
git reset --hard 95714323227f79d7927b4081b1e8fb15ef864d14
@LuoKaiGSW Thanks for assisting with testing the commit.
Please note that this branch is currently under development and might be unstable until all the code has been thoroughly reviewed.
I'll fix that in later commits.
Hi @RexWzh, is this PR ready for further review and testing?
Yes, tests are underway and most of the repositories pass successfully, except for mathlib4, which fails at times due to network issues.
Got this when running one of the tests:
============================= test session starts ==============================
platform linux -- Python 3.11.9, pytest-8.3.2, pluggy-1.5.0
rootdir: /private/home/kaiyuy/LeanDojo
configfile: pyproject.toml
plugins: anyio-4.4.0
collected 48 items
tests/data_extraction/test_cache.py F
=================================== FAILURES ===================================
_______________________________ test_repo_cache ________________________________
lean4_example_url = 'https://github.com/yangky11/lean4-example'
remote_example_url = 'https://gitee.com/rexzong/lean4-example'
example_commit_hash = '3f8c5eb303a225cdef609498b8d87262e5ef344b'
def test_repo_cache(lean4_example_url, remote_example_url, example_commit_hash):
# Note: The `git.Repo` requires the local repo to be cloned in a directory
# all cached repos are stored in CACHE_DIR/repos
prefix = "repos"
repo_name = "lean4-example"
# test local repo cache
with working_directory() as tmp_dir:
repo = Repo.clone_from(lean4_example_url, repo_name)
repo.git.checkout(example_commit_hash)
local_dir = tmp_dir / repo_name
rel_cache_dir = (
prefix / Path(f"gitpython-{repo_name}-{example_commit_hash}") / repo_name
)
cache.store(local_dir, rel_cache_dir)
# get the cache
repo_cache_dir = cache.get(rel_cache_dir)
assert repo_cache_dir is not None
# test remote repo cache
with working_directory() as tmp_dir:
> repo = Repo.clone_from(remote_example_url, repo_name)
tests/data_extraction/test_cache.py:29:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../miniconda3/envs/lean-dojo/lib/python3.11/site-packages/git/repo/base.py:1525: in clone_from
return cls._clone(
../miniconda3/envs/lean-dojo/lib/python3.11/site-packages/git/repo/base.py:1396: in _clone
finalize_process(proc, stderr=stderr)
../miniconda3/envs/lean-dojo/lib/python3.11/site-packages/git/util.py:504: in finalize_process
proc.wait(**kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <git.cmd.Git.AutoInterrupt object at 0x7fa7e24b3fc0>
stderr = "Cloning into 'lean4-example'...\nfatal: unable to access 'https://gitee.com/rexzong/lean4-example/': The requested URL returned error: 502\n"
def wait(self, stderr: Union[None, str, bytes] = b"") -> int:
"""Wait for the process and return its status code.
:param stderr:
Previously read value of stderr, in case stderr is already closed.
:warn:
May deadlock if output or error pipes are used and not handled
separately.
:raise git.exc.GitCommandError:
If the return status is not 0.
"""
if stderr is None:
stderr_b = b""
stderr_b = force_bytes(data=stderr, encoding="utf-8")
status: Union[int, None]
if self.proc is not None:
status = self.proc.wait()
p_stderr = self.proc.stderr
else: # Assume the underlying proc was killed earlier or never existed.
status = self.status
p_stderr = None
def read_all_from_possibly_closed_stream(stream: Union[IO[bytes], None]) -> bytes:
if stream:
try:
return stderr_b + force_bytes(stream.read())
except (OSError, ValueError):
return stderr_b or b""
else:
return stderr_b or b""
# END status handling
if status != 0:
errstr = read_all_from_possibly_closed_stream(p_stderr)
_logger.debug("AutoInterrupt wait stderr: %r" % (errstr,))
> raise GitCommandError(remove_password_if_present(self.args), status, errstr)
E git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
E cmdline: git clone -v -- https://gitee.com/rexzong/lean4-example lean4-example
E stderr: 'Cloning into 'lean4-example'...
E fatal: unable to access 'https://gitee.com/rexzong/lean4-example/': The requested URL returned error: 502
E '
../miniconda3/envs/lean-dojo/lib/python3.11/site-packages/git/cmd.py:834: GitCommandError
=============================== warnings summary ===============================
../miniconda3/envs/lean-dojo/lib/python3.11/site-packages/jupyter_client/connect.py:22
/private/home/kaiyuy/miniconda3/envs/lean-dojo/lib/python3.11/site-packages/jupyter_client/connect.py:22: DeprecationWarning: Jupyter is migrating its paths to use standard platformdirs
given by the platformdirs library. To remove this warning and
see the appropriate new directories, set the environment variable
`JUPYTER_PLATFORM_DIRS=1` and then run `jupyter --paths`.
The use of platformdirs will be the default in `jupyter_core` v6
from jupyter_core.paths import jupyter_data_dir, jupyter_runtime_dir, secure_write
src/lean_dojo/data_extraction/trace.py:97
/private/home/kaiyuy/LeanDojo/src/lean_dojo/data_extraction/trace.py:97: DeprecationWarning: invalid escape sequence '\*'
"""Check if all \*.lean files have been processed to produce \*.ast.json and \*.dep_paths files."""
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/data_extraction/test_cache.py::test_repo_cache - git.exc.GitComm...
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
======================== 1 failed, 2 warnings in 20.04s ========================
I cannot reproduce the above error reliably (probably a network issue). Running other tests now...
All other tests were successful. I merged it into dev
and will merge into main
after some minor cleaning. Thank you so much for this valuable contribution.
Hi,
Thanks for the awesome project!
This PR enhances the
LeanGitRepo
class to support local and remote repositories, introducing arepo_type
property (github, local, remote), simplifying the_to_commit_hash
method, transitioninglean_version
to a string for clarity, and adding tests for the new repository configurations.A Quick Review
We transition from
pygithub
objects togitpython
for managing repositories.For compatibility, a new property
repo_type
is introduced in theLeanGitRepo
class, supporting three values:github
,local
, andremote
. I've adapted the APIs to accommodate these types:LeanGitRepo.repo
LeanGitRepo.url
It's important to note that while the GitHub object connects to an online repo, the Git object interacts directly with a local git directory.
Please let me know if any further adjustments are needed.