lean-dojo / LeanDojo

Tool for data extraction and interacting with Lean programmatically.
https://leandojo.org
MIT License
536 stars 80 forks source link

yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example is not a Git repo. #29

Closed aifuqiang02 closed 1 year ago

aifuqiang02 commented 1 year ago

Description What happened? 按照官网教程,安装了leandojo , 在测试的时候,报错如下信息。请帮忙看看。 做过以下尝试:Extracting Data from Lean 4 , 也是相同问题。

---------------------------------------------------------t.py 内容-------------------------------------------------------------- [root@localhost lean-dojo-demo]# cat t.py from lean_dojo import LeanGitRepo, trace

repo = LeanGitRepo("https://github.com/yangky11/lean-example", "5a0360e49946815cb53132638ccdd46fb1859e2a") trace(repo, dst_dir="traced_lean-example")

---------------------------------------------------------报错信息--------------------------------------------------------------

[root@localhost lean-dojo-demo]# python t.py 2023-07-21 10:54:50.099 | INFO | lean_dojo.data_extraction.cache:get:72 - Downloading the traced repo from the remote cache. Set the environment variable DISABLE_REMOTE_CACHE if you want to trace the repo locally. --2023-07-21 10:54:50-- https://lean-dojo.s3.amazonaws.com/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a.tar.gz Resolving lean-dojo.s3.amazonaws.com (lean-dojo.s3.amazonaws.com)... 52.92.165.81, 52.92.229.105, 52.218.182.99, ... Connecting to lean-dojo.s3.amazonaws.com (lean-dojo.s3.amazonaws.com)|52.92.165.81|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 184449705 (176M) [application/x-gzip] Saving to: ‘/root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a.tar.gz’

100%[=================================================================================================================================>] 184,449,705 20.5MB/s in 9.7s

2023-07-21 10:55:00 (18.1 MB/s) - ‘/root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a.tar.gz’ saved [184449705/184449705]

2023-07-21 10:55:20.405 | INFO | lean_dojo.data_extraction.trace:trace:164 - Loading the traced repo from /root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example Traceback (most recent call last): File "/root/lean-dojo-demo/t.py", line 4, in trace(repo, dst_dir="traced_lean-example") File "/usr/local/python3/lib/python3.9/site-packages/lean_dojo/data_extraction/trace.py", line 165, in trace traced_repo = TracedRepo.load_from_disk(cached_path) File "/usr/local/python3/lib/python3.9/site-packages/lean_dojo/data_extraction/traced_data.py", line 1269, in load_from_disk raise RuntimeError(f"{root_dir} is not a Git repo.") RuntimeError: /root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example is not a Git repo.

Detailed Steps to Reproduce the Behavior

Logs in Debug Mode Set the environment variable VERBOSE=1 and paste the logs here.

Platform Information

yangky11 commented 1 year ago

Hi,

I wasn't able to reproduce the problem, but it looks like something related to git. Can you first upgrade to the latest version of LeanDojo (1.1.2)? Then try the following commands and see if they work. If not, it may help to figure out why they don't work.

cd /root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example
git rev-parse --is-inside-work-tree
aifuqiang02 commented 1 year ago

问题依然存在

[root@localhost lean-dojo-demo]# [root@localhost lean-dojo-demo]# pip list|grep lean-dojo lean-dojo 1.1.2 [root@localhost lean-dojo-demo]# [root@localhost lean-dojo-demo]# [root@localhost lean-dojo-demo]# cd /root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example [root@localhost lean-example]# [root@localhost lean-example]# [root@localhost lean-example]# git rev-parse --is-inside-work-tree fatal: detected dubious ownership in repository at '/root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example' To add an exception for this directory, call:

    git config --global --add safe.directory /root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example

[root@localhost lean-example]# [root@localhost lean-example]# ll total 16 -r--r--r--. 1 1013 1013 42 Jun 22 22:13 leanpkg.path -r--r--r--. 1 1013 1013 111 Jun 22 22:13 leanpkg.toml -r--r--r--. 1 1013 1013 1067 Jun 22 22:13 LICENSE -r--r--r--. 1 1013 1013 15 Jun 22 22:13 README.md dr-xr-xr-x. 2 1013 1013 121 Jun 22 22:14 src dr-xr-xr-x. 3 1013 1013 18 Jun 22 22:13 _target [root@localhost lean-example]# [root@localhost lean-example]# [root@localhost lean-example]# cd - /root/lean-dojo-demo [root@localhost lean-dojo-demo]# python t.py 2023-07-21 12:35:52.496 | INFO | lean_dojo.data_extraction.trace:trace:164 - Loading the traced repo from /root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example Traceback (most recent call last): File "/root/lean-dojo-demo/t.py", line 4, in trace(repo, dst_dir="traced_lean-example") File "/usr/local/python3/lib/python3.9/site-packages/lean_dojo/data_extraction/trace.py", line 165, in trace traced_repo = TracedRepo.load_from_disk(cached_path) File "/usr/local/python3/lib/python3.9/site-packages/lean_dojo/data_extraction/traced_data.py", line 1269, in load_from_disk raise RuntimeError(f"{root_dir} is not a Git repo.") RuntimeError: /root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example is not a Git repo. [root@localhost lean-dojo-demo]# [root@localhost lean-dojo-demo]# [root@localhost lean-dojo-demo]#

yangky11 commented 1 year ago

It looks like the problem is fatal: detected dubious ownership in repository at .... Can you try the suggested workaround?

git config --global --add safe.directory /root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example

BTW, did you trace the repo as the root user? I suspect that might have caused the issue.

aifuqiang02 commented 1 year ago

[root@localhost lean-example]# git config --global --add safe.directory /root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example [root@localhost lean-example]# [root@localhost lean-example]# [root@localhost lean-example]# git rev-parse --is-inside-work-tree true [root@localhost lean-example]#

[root@localhost lean-dojo-demo]# python t.py 2023-07-21 12:50:36.383 | INFO | lean_dojo.data_extraction.trace:trace:164 - Loading the traced repo from /root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 163/163 [10:07<00:00, 3.73s/it] Traceback (most recent call last): File "/root/lean-dojo-demo/t.py", line 4, in trace(repo, dst_dir="traced_lean-example") File "/usr/local/python3/lib/python3.9/site-packages/lean_dojo/data_extraction/trace.py", line 165, in trace traced_repo = TracedRepo.load_from_disk(cached_path) File "/usr/local/python3/lib/python3.9/site-packages/lean_dojo/data_extraction/traced_data.py", line 1292, in load_from_disk dependencies = repo.get_dependencies(root_dir) File "/usr/local/python3/lib/python3.9/site-packages/lean_dojo/data_extraction/lean.py", line 410, in get_dependencies return self._get_lean4_dependencies(path) File "/usr/local/python3/lib/python3.9/site-packages/lean_dojo/data_extraction/lean.py", line 467, in _get_lean4_dependencies lakefile = (Path(path) / "lakefile.lean").open().read() File "/usr/local/python3/lib/python3.9/pathlib.py", line 1241, in open return io.open(self, mode, buffering, encoding, errors, newline, File "/usr/local/python3/lib/python3.9/pathlib.py", line 1109, in _opener return self._accessor.open(self, flags, mode) FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example/lakefile.lean' [root@localhost lean-dojo-demo]#

aifuqiang02 commented 1 year ago

[root@localhost lean-example]# pwd /root/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example [root@localhost lean-example]# [root@localhost lean-example]# ll total 16 -r--r--r--. 1 1013 1013 42 Jun 22 22:13 leanpkg.path -r--r--r--. 1 1013 1013 111 Jun 22 22:13 leanpkg.toml -r--r--r--. 1 1013 1013 1067 Jun 22 22:13 LICENSE -r--r--r--. 1 1013 1013 15 Jun 22 22:13 README.md dr-xr-xr-x. 2 1013 1013 121 Jun 22 22:14 src dr-xr-xr-x. 3 1013 1013 18 Jun 22 22:13 _target [root@localhost lean-example]#

yangky11 commented 1 year ago

This is a Lean 3 repo, but it was somehow misclassified as Lean 4. Can you try the code below and see if the results are as expected?

In [2]: from lean_dojo import *

In [3]: repo = LeanGitRepo("https://github.com/yangky11/lean-example", "5a0360e
   ...: 49946815cb53132638ccdd46fb1859e2a")

In [4]: repo.uses_lean3
Out[4]: True

In [5]: repo.uses_lean4
Out[5]: False
yangky11 commented 1 year ago

Some context: We check if the repo is Lean 3 or Lean 4 by checking if the URL https://raw.githubusercontent.com/yangky11/lean-example/main/leanpkg.toml can be accessed. Therefore, it may make mistakes if Github cannot be accessed reliably, which is common in certain areas of the world. If that becomes an issue, we'll try to provide a workaround in the next version.

yangky11 commented 1 year ago

I tried tracing the repo using sudo and can now reproduce your previous error. The solution is just to not use sudo.

aifuqiang02 commented 1 year ago

在更换普通账户后,将docker 授权于普通账户后,再次尝试。终于好了。非常感谢您的帮助。 小小建议:能在什么相关的文档,或者报错信息里,提示使用普通用户操作。 再次表示感谢!!

[lean@localhost lean-dojo-demo]$ python t.py

2023-07-21 13:43:50.147 | INFO | lean_dojo.data_extraction.cache:get:72 - Downloading the traced repo from the remote cache. Set the environment variable DISABLE_REMOTE_CACHE if you want to trace the repo locally. --2023-07-21 13:43:50-- https://lean-dojo.s3.amazonaws.com/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a.tar.gz Resolving lean-dojo.s3.amazonaws.com (lean-dojo.s3.amazonaws.com)... 52.218.250.43, 52.218.180.131, 52.218.246.91, ... Connecting to lean-dojo.s3.amazonaws.com (lean-dojo.s3.amazonaws.com)|52.218.250.43|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 184449705 (176M) [application/x-gzip] Saving to: ‘/home/lean/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a.tar.gz’

100%[=================================================================================================================================>] 184,449,705 22.3MB/s in 9.2s

2023-07-21 13:44:00 (19.1 MB/s) - ‘/home/lean/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a.tar.gz’ saved [184449705/184449705]

2023-07-21 13:44:31.532 | INFO | lean_dojo.data_extraction.trace:trace:164 - Loading the traced repo from /home/lean/.cache/lean_dojo/yangky11-lean-example-5a0360e49946815cb53132638ccdd46fb1859e2a/lean-example 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 163/163 [05:38<00:00, 2.08s/it] [lean@localhost lean-dojo-demo]$ [lean@localhost lean-dojo-demo]$

yangky11 commented 1 year ago

That's great to hear! In the next version, we'll throw a RuntimeError when running as root. BTW, how did you solve the problem with Lean 3 vs. Lean 4?

aifuqiang02 commented 1 year ago

听取您的建议,使用普通用户后,一切顺利。谢谢。

再给一点小建议, 在这个文档上面,增加一句,表示下述代码是.py 文件。 我刚开始没用看相关介绍时,总弄不明白下述代码是什么语言,我应该如何运行这段代码。

We use LeanDojo to trace the repo by specifying its URL and a commit hash:

from lean_dojo import LeanGitRepo, trace

repo = LeanGitRepo("https://github.com/yangky11/lean-example", "5a0360e49946815cb53132638ccdd46fb1859e2a") trace(repo, dst_dir="traced_lean-example")

tangzhy commented 7 months ago

Some context: We check if the repo is Lean 3 or Lean 4 by checking if the URL https://raw.githubusercontent.com/yangky11/lean-example/main/leanpkg.toml can be accessed. Therefore, it may make mistakes if Github cannot be accessed reliably, which is common in certain areas of the world. If that becomes an issue, we'll try to provide a workaround in the next version.

hi @yangky11, it is suffering to access Github reliably in China. do you have solution to avoid checking Lean 3 or Lean 4? or when will your workaround release in the next version?

yangky11 commented 7 months ago

@tangzhy If the only problem is checking Lean 3 vs. Lean 4, I could add a parameter in LeanGitRepo for the user to explicitly specify whether the repo is in Lean 3 or Lean 4. However, LeanDojo still has to access GitHub when querying the dependencies of a repo. It would be helpful if you could post your use case and any error messages you had.

tangzhy commented 7 months ago

@yangky11 Thanks for reply. The error messages is pasted as below:

[2024-01-30 09:39:29,990] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-01-30 09:39:32.114 | INFO     | __main__:main:243 - PID: 2820814
2024-01-30 09:39:32.114 | INFO     | __main__:main:244 - Namespace(data_path='data/leandojo_minif2f/default/', exp_id=None, split='test', file_path=None, full_name=None, name_filter=None, num_theorems=None, ckpt_path='<path_to_my_model>', indexed_corpus_path=None, tactic=None, module=None, num_sampled_tactics=64, timeout=600, num_cpus=8, with_gpus=True, verbose=True, use_vllm=True)
2024-01-30 09:39:32.121 | DEBUG    | __main__:_get_theorems_from_files:67 - LeanGitRepo url: https://github.com/facebookresearch/miniF2F
2024-01-30 09:39:32.121 | DEBUG    | __main__:_get_theorems_from_files:68 - LeanGitRepo commit: 5271ddec788677c815cf818a06f368ef6498a106
Traceback (most recent call last):
  File "<path_to_my_conda_env>/lib/python3.10/urllib/request.py", line 1348, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "<path_to_my_conda_env>/lib/python3.10/http/client.py", line 1283, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "<path_to_my_conda_env>/lib/python3.10/http/client.py", line 1329, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "<path_to_my_conda_env>/lib/python3.10/http/client.py", line 1278, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "<path_to_my_conda_env>/lib/python3.10/http/client.py", line 1038, in _send_output
    self.send(msg)
  File "<path_to_my_conda_env>/lib/python3.10/http/client.py", line 976, in send
    self.connect()
  File "<path_to_my_conda_env>/lib/python3.10/http/client.py", line 1448, in connect
    super().connect()
  File "<path_to_my_conda_env>/lib/python3.10/http/client.py", line 942, in connect
    self.sock = self._create_connection(
  File "<path_to_my_conda_env>/lib/python3.10/socket.py", line 845, in create_connection
    raise err
  File "<path_to_my_conda_env>/lib/python3.10/socket.py", line 833, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<path_to_my_project>/prover/evaluate.py", line 270, in <module>
    main()
  File "<path_to_my_project>/prover/evaluate.py", line 246, in main
    pass_1 = evaluate(
  File "<path_to_my_project>/prover/evaluate.py", line 123, in evaluate
    repo, theorems, positions = _get_theorems(
  File "<path_to_my_project>/prover/evaluate.py", line 28, in _get_theorems
    repo, theorems, positions = _get_theorems_from_files(
  File "<path_to_my_project>/prover/evaluate.py", line 69, in _get_theorems_from_files
    repo = LeanGitRepo(t["url"], t["commit"])
  File "<string>", line 5, in __init__
  File "<path_to_my_conda_env>/lib/python3.10/site-packages/lean_dojo/data_extraction/lean.py", line 424, in __post_init__
    uses_lean3 = self.is_lean3 or url_exists(
  File "<path_to_my_conda_env>/lib/python3.10/site-packages/lean_dojo/utils.py", line 221, in url_exists
    with urllib.request.urlopen(url) as _:
  File "<path_to_my_conda_env>/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "<path_to_my_conda_env>/lib/python3.10/urllib/request.py", line 519, in open
    response = self._open(req, data)
  File "<path_to_my_conda_env>/lib/python3.10/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "<path_to_my_conda_env>/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "<path_to_my_conda_env>/lib/python3.10/urllib/request.py", line 1391, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "<path_to_my_conda_env>/lib/python3.10/urllib/request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 111] Connection refused>

I understand that LeanDojo may access GitHub repo for multiple times.

I wonder if I have cloned the GitHub repo in my local server, could LeanDojo support directly accessing my local repo instead of accessing GitHub?

tangzhy commented 7 months ago

@yangky11 For LeanDojo accessing locally cloned repo, I even have imagined the API as follows:

Maybe we could turn the original code

repo = LeanGitRepo("https://github.com/leanprover-community/mathlib", "19c869efa56bbb8b500f2724c0b77261edbfa28c")

into

repo = LeanGitRepo("<local_path_to_mathlib>", "19c869efa56bbb8b500f2724c0b77261edbfa28c")

What do you think of it:)

yangky11 commented 6 months ago

@tangzhy This would be a great feature, though currently I do not have the bandwidth to implement it.