Closed aifuqiang02 closed 1 year ago
Hi,
I wasn't able to reproduce the problem, but it looks like something related to git. Can you first upgrade to the latest version of LeanDojo (1.1.2)? Then try the following commands and see if they work. If not, it may help to figure out why they don't work.
cd /root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example
git rev-parse --is-inside-work-tree
问题依然存在
[root@localhost lean-dojo-demo]# [root@localhost lean-dojo-demo]# pip list|grep lean-dojo lean-dojo 1.1.2 [root@localhost lean-dojo-demo]# [root@localhost lean-dojo-demo]# [root@localhost lean-dojo-demo]# cd /root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example [root@localhost lean-example]# [root@localhost lean-example]# [root@localhost lean-example]# git rev-parse --is-inside-work-tree fatal: detected dubious ownership in repository at '/root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example' To add an exception for this directory, call:
git config --global --add safe.directory /root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example
[root@localhost lean-example]#
[root@localhost lean-example]# ll
total 16
-r--r--r--. 1 1013 1013 42 Jun 22 22:13 leanpkg.path
-r--r--r--. 1 1013 1013 111 Jun 22 22:13 leanpkg.toml
-r--r--r--. 1 1013 1013 1067 Jun 22 22:13 LICENSE
-r--r--r--. 1 1013 1013 15 Jun 22 22:13 README.md
dr-xr-xr-x. 2 1013 1013 121 Jun 22 22:14 src
dr-xr-xr-x. 3 1013 1013 18 Jun 22 22:13 _target
[root@localhost lean-example]#
[root@localhost lean-example]#
[root@localhost lean-example]# cd -
/root/lean-dojo-demo
[root@localhost lean-dojo-demo]# python t.py
2023-07-21 12:35:52.496 | INFO | lean_dojo.data_extraction.trace:trace:164 - Loading the traced repo from /root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example
Traceback (most recent call last):
File "/root/lean-dojo-demo/t.py", line 4, in
It looks like the problem is fatal: detected dubious ownership in repository at ...
. Can you try the suggested workaround?
git config --global --add safe.directory /root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example
BTW, did you trace the repo as the root user? I suspect that might have caused the issue.
[root@localhost lean-example]# git config --global --add safe.directory /root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example [root@localhost lean-example]# [root@localhost lean-example]# [root@localhost lean-example]# git rev-parse --is-inside-work-tree true [root@localhost lean-example]#
[root@localhost lean-dojo-demo]# python t.py
2023-07-21 12:50:36.383 | INFO | lean_dojo.data_extraction.trace:trace:164 - Loading the traced repo from /root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 163/163 [10:07<00:00, 3.73s/it]
Traceback (most recent call last):
File "/root/lean-dojo-demo/t.py", line 4, in
[root@localhost lean-example]# pwd /root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example [root@localhost lean-example]# [root@localhost lean-example]# ll total 16 -r--r--r--. 1 1013 1013 42 Jun 22 22:13 leanpkg.path -r--r--r--. 1 1013 1013 111 Jun 22 22:13 leanpkg.toml -r--r--r--. 1 1013 1013 1067 Jun 22 22:13 LICENSE -r--r--r--. 1 1013 1013 15 Jun 22 22:13 README.md dr-xr-xr-x. 2 1013 1013 121 Jun 22 22:14 src dr-xr-xr-x. 3 1013 1013 18 Jun 22 22:13 _target [root@localhost lean-example]#
This is a Lean 3 repo, but it was somehow misclassified as Lean 4. Can you try the code below and see if the results are as expected?
In [2]: from lean_dojo import *
In [3]: repo = LeanGitRepo("https://github.com/yangky11/lean-example", "5a0360e
...: 49946815cb53132638ccdd46fb1859e2a")
In [4]: repo.uses_lean3
Out[4]: True
In [5]: repo.uses_lean4
Out[5]: False
Some context: We check if the repo is Lean 3 or Lean 4 by checking if the URL https://raw.githubusercontent.com/yangky11/lean-example/main/leanpkg.toml
can be accessed. Therefore, it may make mistakes if Github cannot be accessed reliably, which is common in certain areas of the world. If that becomes an issue, we'll try to provide a workaround in the next version.
I tried tracing the repo using sudo and can now reproduce your previous error. The solution is just to not use sudo.
在更换普通账户后,将docker 授权于普通账户后,再次尝试。终于好了。非常感谢您的帮助。 小小建议:能在什么相关的文档,或者报错信息里,提示使用普通用户操作。 再次表示感谢!!
[lean@localhost lean-dojo-demo]$ python t.py
2023-07-21 13:43:50.147 | INFO | lean_dojo.data_extraction.cache:get:72 - Downloading the traced repo from the remote cache. Set the environment variable DISABLE_REMOTE_CACHE
if you want to trace the repo locally.
--2023-07-21 13:43:50-- https://lean-dojo.s3.amazonaws.com/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a.tar.gz
Resolving lean-dojo.s3.amazonaws.com (lean-dojo.s3.amazonaws.com)... 52.218.250.43, 52.218.180.131, 52.218.246.91, ...
Connecting to lean-dojo.s3.amazonaws.com (lean-dojo.s3.amazonaws.com)|52.218.250.43|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 184449705 (176M) [application/x-gzip]
Saving to: ‘/home/lean/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a.tar.gz’
100%[=================================================================================================================================>] 184,449,705 22.3MB/s in 9.2s
2023-07-21 13:44:00 (19.1 MB/s) - ‘/home/lean/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a.tar.gz’ saved [184449705/184449705]
2023-07-21 13:44:31.532 | INFO | lean_dojo.data_extraction.trace:trace:164 - Loading the traced repo from /home/lean/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 163/163 [05:38<00:00, 2.08s/it] [lean@localhost lean-dojo-demo]$ [lean@localhost lean-dojo-demo]$
That's great to hear! In the next version, we'll throw a RuntimeError
when running as root. BTW, how did you solve the problem with Lean 3 vs. Lean 4?
听取您的建议,使用普通用户后,一切顺利。谢谢。
再给一点小建议, 在这个文档上面,增加一句,表示下述代码是.py 文件。 我刚开始没用看相关介绍时,总弄不明白下述代码是什么语言,我应该如何运行这段代码。
We use LeanDojo to trace the repo by specifying its URL and a commit hash:
from lean_dojo import LeanGitRepo, trace
repo = LeanGitRepo("https://github.com/yangky11/lean-example", "5a0360e49946815cb53132638ccdd46fb1859e2a") trace(repo, dst_dir="traced_lean-example")
Some context: We check if the repo is Lean 3 or Lean 4 by checking if the URL
https://raw.githubusercontent.com/yangky11/lean-example/main/leanpkg.toml
can be accessed. Therefore, it may make mistakes if Github cannot be accessed reliably, which is common in certain areas of the world. If that becomes an issue, we'll try to provide a workaround in the next version.
hi @yangky11, it is suffering to access Github reliably in China. do you have solution to avoid checking Lean 3 or Lean 4? or when will your workaround release in the next version?
@tangzhy If the only problem is checking Lean 3 vs. Lean 4, I could add a parameter in LeanGitRepo
for the user to explicitly specify whether the repo is in Lean 3 or Lean 4. However, LeanDojo still has to access GitHub when querying the dependencies of a repo. It would be helpful if you could post your use case and any error messages you had.
@yangky11 Thanks for reply. The error messages is pasted as below:
[2024-01-30 09:39:29,990] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-01-30 09:39:32.114 | INFO | __main__:main:243 - PID: 2820814
2024-01-30 09:39:32.114 | INFO | __main__:main:244 - Namespace(data_path='data/leandojo_minif2f/default/', exp_id=None, split='test', file_path=None, full_name=None, name_filter=None, num_theorems=None, ckpt_path='<path_to_my_model>', indexed_corpus_path=None, tactic=None, module=None, num_sampled_tactics=64, timeout=600, num_cpus=8, with_gpus=True, verbose=True, use_vllm=True)
2024-01-30 09:39:32.121 | DEBUG | __main__:_get_theorems_from_files:67 - LeanGitRepo url: https://github.com/facebookresearch/miniF2F
2024-01-30 09:39:32.121 | DEBUG | __main__:_get_theorems_from_files:68 - LeanGitRepo commit: 5271ddec788677c815cf818a06f368ef6498a106
Traceback (most recent call last):
File "<path_to_my_conda_env>/lib/python3.10/urllib/request.py", line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "<path_to_my_conda_env>/lib/python3.10/http/client.py", line 1283, in request
self._send_request(method, url, body, headers, encode_chunked)
File "<path_to_my_conda_env>/lib/python3.10/http/client.py", line 1329, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "<path_to_my_conda_env>/lib/python3.10/http/client.py", line 1278, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "<path_to_my_conda_env>/lib/python3.10/http/client.py", line 1038, in _send_output
self.send(msg)
File "<path_to_my_conda_env>/lib/python3.10/http/client.py", line 976, in send
self.connect()
File "<path_to_my_conda_env>/lib/python3.10/http/client.py", line 1448, in connect
super().connect()
File "<path_to_my_conda_env>/lib/python3.10/http/client.py", line 942, in connect
self.sock = self._create_connection(
File "<path_to_my_conda_env>/lib/python3.10/socket.py", line 845, in create_connection
raise err
File "<path_to_my_conda_env>/lib/python3.10/socket.py", line 833, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<path_to_my_project>/prover/evaluate.py", line 270, in <module>
main()
File "<path_to_my_project>/prover/evaluate.py", line 246, in main
pass_1 = evaluate(
File "<path_to_my_project>/prover/evaluate.py", line 123, in evaluate
repo, theorems, positions = _get_theorems(
File "<path_to_my_project>/prover/evaluate.py", line 28, in _get_theorems
repo, theorems, positions = _get_theorems_from_files(
File "<path_to_my_project>/prover/evaluate.py", line 69, in _get_theorems_from_files
repo = LeanGitRepo(t["url"], t["commit"])
File "<string>", line 5, in __init__
File "<path_to_my_conda_env>/lib/python3.10/site-packages/lean_dojo/data_extraction/lean.py", line 424, in __post_init__
uses_lean3 = self.is_lean3 or url_exists(
File "<path_to_my_conda_env>/lib/python3.10/site-packages/lean_dojo/utils.py", line 221, in url_exists
with urllib.request.urlopen(url) as _:
File "<path_to_my_conda_env>/lib/python3.10/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
File "<path_to_my_conda_env>/lib/python3.10/urllib/request.py", line 519, in open
response = self._open(req, data)
File "<path_to_my_conda_env>/lib/python3.10/urllib/request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "<path_to_my_conda_env>/lib/python3.10/urllib/request.py", line 496, in _call_chain
result = func(*args)
File "<path_to_my_conda_env>/lib/python3.10/urllib/request.py", line 1391, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "<path_to_my_conda_env>/lib/python3.10/urllib/request.py", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 111] Connection refused>
I understand that LeanDojo may access GitHub repo for multiple times.
I wonder if I have cloned the GitHub repo in my local server, could LeanDojo support directly accessing my local repo instead of accessing GitHub?
@yangky11 For LeanDojo accessing locally cloned repo, I even have imagined the API as follows:
Maybe we could turn the original code
repo = LeanGitRepo("https://github.com/leanprover-community/mathlib", "19c869efa56bbb8b500f2724c0b77261edbfa28c")
into
repo = LeanGitRepo("<local_path_to_mathlib>", "19c869efa56bbb8b500f2724c0b77261edbfa28c")
What do you think of it:)
@tangzhy This would be a great feature, though currently I do not have the bandwidth to implement it.
Description What happened? 按照官网教程,安装了leandojo , 在测试的时候,报错如下信息。请帮忙看看。 做过以下尝试:Extracting Data from Lean 4 , 也是相同问题。
---------------------------------------------------------t.py 内容-------------------------------------------------------------- [root@localhost lean-dojo-demo]# cat t.py from lean_dojo import LeanGitRepo, trace
repo = LeanGitRepo("https://github.com/yangky11/lean-example", "5a0360e49946815cb53132638ccdd46fb1859e2a") trace(repo, dst_dir="traced_lean-example")
---------------------------------------------------------报错信息--------------------------------------------------------------
[root@localhost lean-dojo-demo]# python t.py 2023-07-21 10:54:50.099 | INFO | lean_dojo.data_extraction.cache:get:72 - Downloading the traced repo from the remote cache. Set the environment variable
DISABLE_REMOTE_CACHE
if you want to trace the repo locally. --2023-07-21 10:54:50-- https://lean-dojo.s3.amazonaws.com/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a.tar.gz Resolving lean-dojo.s3.amazonaws.com (lean-dojo.s3.amazonaws.com)... 52.92.165.81, 52.92.229.105, 52.218.182.99, ... Connecting to lean-dojo.s3.amazonaws.com (lean-dojo.s3.amazonaws.com)|52.92.165.81|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 184449705 (176M) [application/x-gzip] Saving to: ‘/root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a.tar.gz’100%[=================================================================================================================================>] 184,449,705 20.5MB/s in 9.7s
2023-07-21 10:55:00 (18.1 MB/s) - ‘/root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a.tar.gz’ saved [184449705/184449705]
2023-07-21 10:55:20.405 | INFO | lean_dojo.data_extraction.trace:trace:164 - Loading the traced repo from /root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example Traceback (most recent call last): File "/root/lean-dojo-demo/t.py", line 4, in
trace(repo, dst_dir="traced_lean-example")
File "/usr/local/python3/lib/python3.9/site-packages/lean_dojo/data_extraction/trace.py", line 165, in trace
traced_repo = TracedRepo.load_from_disk(cached_path)
File "/usr/local/python3/lib/python3.9/site-packages/lean_dojo/data_extraction/traced_data.py", line 1269, in load_from_disk
raise RuntimeError(f"{root_dir} is not a Git repo.")
RuntimeError: /root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example is not a Git repo.
Detailed Steps to Reproduce the Behavior
Logs in Debug Mode Set the environment variable
VERBOSE=1
and paste the logs here.Platform Information