lean-dojo / LeanDojo

Tool for data extraction and interacting with Lean programmatically.
https://leandojo.org
MIT License
475 stars 72 forks source link

Errors tracing older repos #177

Open AG161 opened 2 days ago

AG161 commented 2 days ago

I know that LeanDojo has dropped support for older versions of Lean, but I used to be able to trace older Lean repos with older LeanDojo versions, now I'm getting errors when I do so.

Here are two examples with the following miniF2F repo: https://github.com/rah4927/lean-dojo-mew/

LeanDojo 1.3.0 logs (I removed the 1k+ lines of the repo's building):

(llemma) agittis@trurl:~/Dojo$ python
Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from lean_dojo import *
2024-07-01 18:10:33.439 | DEBUG    | lean_dojo.constants:<module>:68 - Using Git
Hub personal access token for authentication
>>> repo = LeanGitRepo("https://github.com/rah4927/lean-dojo-mew","d00c776260c77
de7e70125ef0cd119de6c0ff1de")
>>> tr = trace(repo)
2024-07-01 18:12:12.658 | INFO     | lean_dojo.data_extraction.trace:get_traced_
repo_path:142 - Tracing LeanGitRepo(url='https://github.com/rah4927/lean-dojo-me
w', commit='d00c776260c77de7e70125ef0cd119de6c0ff1de')
2024-07-01 18:12:12.663 | DEBUG    | lean_dojo.data_extraction.trace:get_traced_
repo_path:144 - Working in the temporary directory /tmp/tmpavn06hfm
2024-07-01 18:12:13.395 | DEBUG    | lean_dojo.data_extraction.lean:clone_and_ch
eckout:443 - Cloning LeanGitRepo(url='https://github.com/rah4927/lean-dojo-mew',
 commit='d00c776260c77de7e70125ef0cd119de6c0ff1de')
2024-07-01 18:12:14.205 | DEBUG    | lean_dojo.data_extraction.trace:_trace_lean
4:108 - Tracing LeanGitRepo(url='https://github.com/rah4927/lean-dojo-mew', comm
it='d00c776260c77de7e70125ef0cd119de6c0ff1de')
2024-07-01 18:12:14.211 | DEBUG    | lean_dojo.container:run:183 - NUM_PROCS=32
python3 build_lean4_repo.py lean-dojo-mew
2024-07-01 18:12:14.301 | INFO     | __main__:main:121 - Building lean-dojo-mew
info: cloning https://github.com/leanprover-community/mathlib4.git to ./lake-pac
kages/mathlib
info: cloning https://github.com/leanprover/std4 to ./lake-packages/std
info: cloning https://github.com/gebner/quote4 to ./lake-packages/Qq
info: cloning https://github.com/JLimperg/aesop to ./lake-packages/aesop
info: cloning https://github.com/mhuisi/lean4-cli.git to ./lake-packages/Cli
info: cloning https://github.com/EdAyers/ProofWidgets4 to ./lake-packages/proofw
idgets
Downloading proofwidgets/v0.0.13/linux-64.tar.gz
[0/29] Building Std.Tactic.Unreachable
[0/29] Building Std.Lean.Command
...
[1345/1347] Building MiniF2F.All
[1346/1347] Building MiniF2F
2024-07-01 18:27:55.758 | INFO     | __main__:main:154 - Tracing lean-dojo-mew
 95%|█████████████████████████████████████▏ | 1978/2077 [20:11<01:25,  1.16it/s]
2024-07-01 18:48:11.866 | DEBUG    | lean_dojo.data_extraction.traced_data:from_
traced_files:1435 - Parsing 1979 *.ast.json files in /tmp/tmpavn06hfm/lean-dojo-
mew with 31 workers
2024-07-01 18:48:13,983 INFO worker.py:1715 -- Started a local Ray instance. Vie
w the dashboard at 127.0.0.1:8265
  0%|                                                  | 0/1979 [00:00<?, ?it/s]
(pid=471633) 2024-07-01 18:48:15.445 | DEBUG    | lean_dojo.constants:<module>:6
8 - Using GitHub personal access token for authentication
100%|███████████████████████████████████████| 1979/1979 [03:39<00:00,  9.03it/s]
(pid=471655) 2024-07-01 18:48:15.545 | DEBUG    | lean_dojo.constants:<module>:6
8 - Using GitHub personal access token for authentication [repeated 30x across c
luster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log d
eduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging
.html#log-deduplication for more options.)
2024-07-01 18:51:55.738 | DEBUG    | lean_dojo.data_extraction.lean:_get_lean4_d
ependencies:524 - Querying the dependencies of LeanGitRepo(url='https://github.c
om/rah4927/lean-dojo-mew', commit='d00c776260c77de7e70125ef0cd119de6c0ff1de')
2024-07-01 18:52:14.965 | DEBUG    | lean_dojo.data_extraction.lean:_get_lean4_d
ependencies:524 - Querying the dependencies of LeanGitRepo(url='https://github.c
om/leanprover-community/mathlib4', commit='851ca531d422d962c5366983029f32c9641f3
4c3')
2024-07-01 18:52:40.154 | DEBUG    | lean_dojo.data_extraction.traced_data:save_
to_disk:1482 - Saving 1979 traced XML files to /tmp/tmpavn06hfm/lean-dojo-mew wi
th 31 workers
2024-07-01 18:52:43,115 INFO worker.py:1715 -- Started a local Ray instance. Vie
w the dashboard at 127.0.0.1:8265
  0%|                                                  | 0/1979 [00:00<?, ?it/s]
(pid=479103) 2024-07-01 18:52:44.272 | DEBUG    | lean_dojo.constants:<module>:6
8 - Using GitHub personal access token for authentication
100%|███████████████████████████████████████| 1979/1979 [01:02<00:00, 31.57it/s]
(pid=479122) 2024-07-01 18:52:44.372 | DEBUG    | lean_dojo.constants:<module>:6
8 - Using GitHub personal access token for authentication [repeated 30x across c
luster]
2024-07-01 18:53:55.819 | INFO     | lean_dojo.data_extraction.trace:trace:174 -
 Loading the traced repo from /home/agittis/.cache/lean_dojo/rah4927-lean-dojo-m
ew-d00c776260c77de7e70125ef0cd119de6c0ff1de/lean-dojo-mew
2024-07-01 18:53:56.384 | DEBUG    | lean_dojo.data_extraction.traced_data:load_
from_disk:1509 - Loading 1979 traced XML files from /home/agittis/.cache/lean_do
jo/rah4927-lean-dojo-mew-d00c776260c77de7e70125ef0cd119de6c0ff1de/lean-dojo-mew
with 31 workers
2024-07-01 18:53:59,539 INFO worker.py:1715 -- Started a local Ray instance. Vie
w the dashboard at 127.0.0.1:8265
  0%|                                                  | 0/1979 [00:00<?, ?it/s]
(pid=486553) 2024-07-01 18:54:00.656 | DEBUG    | lean_dojo.constants:<module>:6
8 - Using GitHub personal access token for authentication
100%|███████████████████████████████████████| 1979/1979 [04:52<00:00,  6.77it/s]
(pid=486559) 2024-07-01 18:54:00.764 | DEBUG    | lean_dojo.constants:<module>:6
8 - Using GitHub personal access token for authentication [repeated 30x across c
luster]
2024-07-01 18:58:54.418 | DEBUG    | lean_dojo.data_extraction.lean:_get_lean4_d
ependencies:524 - Querying the dependencies of LeanGitRepo(url='https://github.c
om/rah4927/lean-dojo-mew', commit='d00c776260c77de7e70125ef0cd119de6c0ff1de')
2024-07-01 18:59:00.491 | DEBUG    | lean_dojo.data_extraction.lean:_get_lean4_d
ependencies:524 - Querying the dependencies of LeanGitRepo(url='https://github.c
om/leanprover-community/mathlib4', commit='851ca531d422d962c5366983029f32c9641f3
4c3')
2024-07-01 18:59:26.604 | DEBUG    | lean_dojo.data_extraction.traced_data:check
_sanity:1372 - Checking the sanity of TracedRepo(repo=LeanGitRepo(url='https://g
ithub.com/rah4927/lean-dojo-mew', commit='d00c776260c77de7e70125ef0cd119de6c0ff1
de'), dependencies={'lean4': LeanGitRepo(url='https://github.com/leanprover/lean
4', commit='3b58e0649156610ce3aeed4f7b5c652340c668d4'), 'mathlib': LeanGitRepo(u
rl='https://github.com/leanprover-community/mathlib4', commit='851ca531d422d962c
5366983029f32c9641f34c3')}, root_dir=PosixPath('/home/agittis/.cache/lean_dojo/r
ah4927-lean-dojo-mew-d00c776260c77de7e70125ef0cd119de6c0ff1de/lean-dojo-mew'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/agittis/miniconda3/envs/llemma/lib/python3.10/site-packages/lean_d
ojo/data_extraction/trace.py", line 176, in trace
    traced_repo.check_sanity()
  File "/home/agittis/miniconda3/envs/llemma/lib/python3.10/site-packages/lean_d
ojo/data_extraction/traced_data.py", line 1402, in check_sanity
    tf.check_sanity()
  File "/home/agittis/miniconda3/envs/llemma/lib/python3.10/site-packages/lean_d
ojo/data_extraction/traced_data.py", line 981, in check_sanity
    for t in self.get_traced_theorems():
  File "/home/agittis/miniconda3/envs/llemma/lib/python3.10/site-packages/lean_d
ojo/data_extraction/traced_data.py", line 1056, in get_traced_theorems
    self.traverse_preorder(_callback, node_cls)
  File "/home/agittis/miniconda3/envs/llemma/lib/python3.10/site-packages/lean_d
ojo/data_extraction/traced_data.py", line 994, in traverse_preorder
    self.ast.traverse_preorder(callback, node_cls)
  File "/home/agittis/miniconda3/envs/llemma/lib/python3.10/site-packages/lean_d
ojo/data_extraction/ast/lean4/node.py", line 56, in traverse_preorder
    child.traverse_preorder(callback, node_cls, parents + [self])
  File "/home/agittis/miniconda3/envs/llemma/lib/python3.10/site-packages/lean_d
ojo/data_extraction/ast/lean4/node.py", line 56, in traverse_preorder
    child.traverse_preorder(callback, node_cls, parents + [self])
  File "/home/agittis/miniconda3/envs/llemma/lib/python3.10/site-packages/lean_d
ojo/data_extraction/ast/lean4/node.py", line 53, in traverse_preorder
    if callback(self, parents):
  File "/home/agittis/miniconda3/envs/llemma/lib/python3.10/site-packages/lean_d
ojo/data_extraction/traced_data.py", line 1046, in _callback
    repo, path = self._get_repo_and_relative_path()
  File "/home/agittis/miniconda3/envs/llemma/lib/python3.10/site-packages/lean_d
ojo/data_extraction/traced_data.py", line 1004, in _get_repo_and_relative_path
    repo = self.traced_repo.dependencies[name]
KeyError: 'aesop'

LeanDojo 1.5.1 logs (again with build lines removed):

(dj1-5) agittis@trurl:~/Dojo$ python
Python 3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from lean_dojo import *
2024-07-01 21:31:00.437 | DEBUG    | lean_dojo.constants:<module>:67 - Using Git
Hub personal access token for authentication
>>> repo = LeanGitRepo("https://github.com/rah4927/lean-dojo-mew","d00c776260c77
de7e70125ef0cd119de6c0ff1de")
2024-07-01 21:31:35.484 | DEBUG    | lean_dojo.data_extraction.lean:_to_commit_h
ash:48 - Querying the commit hash for lean4-nightly nightly-2023-08-19
>>> tr = trace(repo)
2024-07-01 21:31:43.161 | INFO     | lean_dojo.data_extraction.trace:get_traced_
repo_path:156 - Tracing LeanGitRepo(url='https://github.com/rah4927/lean-dojo-me
w', commit='d00c776260c77de7e70125ef0cd119de6c0ff1de')
2024-07-01 21:31:43.163 | DEBUG    | lean_dojo.data_extraction.trace:get_traced_
repo_path:158 - Working in the temporary directory /tmp/tmpi_m637eb
2024-07-01 21:31:43.947 | DEBUG    | lean_dojo.data_extraction.lean:clone_and_ch
eckout:491 - Cloning LeanGitRepo(url='https://github.com/rah4927/lean-dojo-mew',
 commit='d00c776260c77de7e70125ef0cd119de6c0ff1de')
2024-07-01 21:31:44.786 | DEBUG    | lean_dojo.data_extraction.trace:_trace_lean
4:110 - Tracing LeanGitRepo(url='https://github.com/rah4927/lean-dojo-mew', comm
it='d00c776260c77de7e70125ef0cd119de6c0ff1de')
2024-07-01 21:31:44.791 | DEBUG    | lean_dojo.container:run:183 - NUM_PROCS=32
python3 build_lean4_repo.py lean-dojo-mew
2024-07-01 21:31:44.879 | INFO     | __main__:main:164 - Building lean-dojo-mew
info: cloning https://github.com/leanprover-community/mathlib4.git to ./lake-pac
kages/mathlib
info: cloning https://github.com/leanprover/std4 to ./lake-packages/std
info: cloning https://github.com/gebner/quote4 to ./lake-packages/Qq
info: cloning https://github.com/JLimperg/aesop to ./lake-packages/aesop
info: cloning https://github.com/mhuisi/lean4-cli.git to ./lake-packages/Cli
info: cloning https://github.com/EdAyers/ProofWidgets4 to ./lake-packages/proofw
idgets
Downloading proofwidgets/v0.0.13/linux-64.tar.gz
[0/24] Building Std.Lean.TagAttribute
...
[1345/1347] Building MiniF2F.All
[1346/1347] Building MiniF2F
2024-07-01 21:47:25.140 | INFO     | __main__:main:187 - Tracing lean-dojo-mew
100%|██████████████████████████████████████▉| 2072/2077 [20:56<00:01,  3.65it/s]
2024-07-01 22:08:23.654 | DEBUG    | lean_dojo.data_extraction.traced_data:from_
traced_files:1417 - Parsing 2077 *.ast.json files in /tmp/tmpi_m637eb/lean-dojo-
mew with 31 workers
2024-07-01 22:08:25,975 INFO worker.py:1715 -- Started a local Ray instance. Vie
w the dashboard at 127.0.0.1:8265
  0%|                                                  | 0/2077 [00:00<?, ?it/s]
(pid=716299) 2024-07-01 22:08:27.278 | DEBUG    | lean_dojo.constants:<module>:6
7 - Using GitHub personal access token for authentication
100%|███████████████████████████████████████| 2077/2077 [03:45<00:00,  9.19it/s]
(pid=716332) 2024-07-01 22:08:27.363 | DEBUG    | lean_dojo.constants:<module>:6
7 - Using GitHub personal access token for authentication [repeated 30x across c
luster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log d
eduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging
.html#log-deduplication for more options.)
2024-07-01 22:12:14.287 | DEBUG    | lean_dojo.data_extraction.lean:get_dependen
cies:537 - Querying the dependencies of LeanGitRepo(url='https://github.com/rah4
927/lean-dojo-mew', commit='d00c776260c77de7e70125ef0cd119de6c0ff1de')
2024-07-01 22:12:16.843 | DEBUG    | lean_dojo.data_extraction.lean:_to_commit_h
ash:48 - Querying the commit hash for lean4 v4.10.0-rc1
2024-07-01 22:12:32.097 | DEBUG    | lean_dojo.data_extraction.lean:_to_commit_h
ash:48 - Querying the commit hash for lean4 v4.9.0-rc1
2024-07-01 22:12:48.047 | DEBUG    | lean_dojo.data_extraction.lean:_to_commit_h
ash:48 - Querying the commit hash for lean4 v4.7.0
2024-07-01 22:13:13.765 | DEBUG    | lean_dojo.data_extraction.traced_data:save_
to_disk:1460 - Saving 2077 traced XML files to /tmp/tmpi_m637eb/lean-dojo-mew wi
th 31 workers
2024-07-01 22:13:16,795 INFO worker.py:1715 -- Started a local Ray instance. Vie
w the dashboard at 127.0.0.1:8265
  0%|                                                  | 0/2077 [00:00<?, ?it/s]
(pid=723743) 2024-07-01 22:13:18.020 | DEBUG    | lean_dojo.constants:<module>:6
7 - Using GitHub personal access token for authentication
100%|███████████████████████████████████████| 2077/2077 [01:06<00:00, 31.23it/s]
(pid=723769) 2024-07-01 22:13:18.175 | DEBUG    | lean_dojo.constants:<module>:6
7 - Using GitHub personal access token for authentication [repeated 30x across c
luster]
2024-07-01 22:14:37.074 | INFO     | lean_dojo.data_extraction.trace:trace:193 -
 Loading the traced repo from /home/agittis/.cache/lean_dojo/rah4927-lean-dojo-m
ew-d00c776260c77de7e70125ef0cd119de6c0ff1de/lean-dojo-mew
2024-07-01 22:14:37.138 | DEBUG    | lean_dojo.data_extraction.traced_data:load_
from_disk:1489 - Loading 2077 traced XML files from /home/agittis/.cache/lean_do
jo/rah4927-lean-dojo-mew-d00c776260c77de7e70125ef0cd119de6c0ff1de/lean-dojo-mew
with 31 workers
2024-07-01 22:14:41,276 INFO worker.py:1715 -- Started a local Ray instance. Vie
w the dashboard at 127.0.0.1:8265
  0%|                                                  | 0/2077 [00:00<?, ?it/s]
(pid=731153) 2024-07-01 22:14:42.466 | DEBUG    | lean_dojo.constants:<module>:6
7 - Using GitHub personal access token for authentication
100%|███████████████████████████████████████| 2077/2077 [05:18<00:00,  6.52it/s]
(pid=731158) 2024-07-01 22:14:42.592 | DEBUG    | lean_dojo.constants:<module>:6
7 - Using GitHub personal access token for authentication [repeated 30x across c
luster]
2024-07-01 22:20:02.430 | DEBUG    | lean_dojo.data_extraction.lean:get_dependen
cies:537 - Querying the dependencies of LeanGitRepo(url='https://github.com/rah4
927/lean-dojo-mew', commit='d00c776260c77de7e70125ef0cd119de6c0ff1de')
2024-07-01 22:20:16.631 | DEBUG    | lean_dojo.data_extraction.traced_data:check
_sanity:1351 - Checking the sanity of TracedRepo(repo=LeanGitRepo(url='https://g
ithub.com/rah4927/lean-dojo-mew', commit='d00c776260c77de7e70125ef0cd119de6c0ff1
de'), dependencies={'lean4': LeanGitRepo(url='https://github.com/leanprover/lean
4', commit='3b58e0649156610ce3aeed4f7b5c652340c668d4'), 'mathlib': LeanGitRepo(u
rl='https://github.com/leanprover-community/mathlib4', commit='12473136ebd2e811c
9c929c11522af5a32d603fb'), 'batteries': LeanGitRepo(url='https://github.com/lean
prover-community/batteries', commit='bba0af6e930ebcbabfacf021b21dd881d58aaa9d'),
 'Qq': LeanGitRepo(url='https://github.com/leanprover-community/quote4', commit=
'a7bfa63f5dddbcab2d4e0569c4cac74b2585e2c6'), 'aesop': LeanGitRepo(url='https://g
ithub.com/leanprover-community/aesop', commit='a64fe24aa94e21404940e9217363a9a1e
d9a33a6'), 'proofwidgets': LeanGitRepo(url='https://github.com/leanprover-commun
ity/ProofWidgets4', commit='d1b33202c3a29a079f292de65ea438648123b635'), 'Cli': L
eanGitRepo(url='https://github.com/leanprover/lean4-cli', commit='a11566029bd9ec
4f68a65394e8c3ff1af74c1a29'), 'importGraph': LeanGitRepo(url='https://github.com
/leanprover-community/import-graph', commit='d366a602cc4a325a6f9db3a3991dfa6d6cf
409c5')}, root_dir=PosixPath('/home/agittis/.cache/lean_dojo/rah4927-lean-dojo-m
ew-d00c776260c77de7e70125ef0cd119de6c0ff1de/lean-dojo-mew'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/agittis/miniconda3/envs/dj1-5/lib/python3.10/site-packages/lean_do
jo/data_extraction/trace.py", line 195, in trace
    traced_repo.check_sanity()
  File "/home/agittis/miniconda3/envs/dj1-5/lib/python3.10/site-packages/lean_do
jo/data_extraction/traced_data.py", line 1384, in check_sanity
    tf.check_sanity()
  File "/home/agittis/miniconda3/envs/dj1-5/lib/python3.10/site-packages/lean_do
jo/data_extraction/traced_data.py", line 943, in check_sanity
    for t in self.get_traced_theorems():
  File "/home/agittis/miniconda3/envs/dj1-5/lib/python3.10/site-packages/lean_do
jo/data_extraction/traced_data.py", line 1032, in get_traced_theorems
    self.traverse_preorder(_callback, node_cls=None)
  File "/home/agittis/miniconda3/envs/dj1-5/lib/python3.10/site-packages/lean_do
jo/data_extraction/traced_data.py", line 956, in traverse_preorder
    self.ast.traverse_preorder(callback, node_cls)
  File "/home/agittis/miniconda3/envs/dj1-5/lib/python3.10/site-packages/lean_do
jo/data_extraction/ast/lean4/node.py", line 56, in traverse_preorder
    child.traverse_preorder(callback, node_cls, parents + [self])
  File "/home/agittis/miniconda3/envs/dj1-5/lib/python3.10/site-packages/lean_do
jo/data_extraction/ast/lean4/node.py", line 56, in traverse_preorder
    child.traverse_preorder(callback, node_cls, parents + [self])
  File "/home/agittis/miniconda3/envs/dj1-5/lib/python3.10/site-packages/lean_do
jo/data_extraction/ast/lean4/node.py", line 53, in traverse_preorder
    if callback(self, parents):
  File "/home/agittis/miniconda3/envs/dj1-5/lib/python3.10/site-packages/lean_do
jo/data_extraction/traced_data.py", line 1023, in _callback
    repo, path = self._get_repo_and_relative_path()
  File "/home/agittis/miniconda3/envs/dj1-5/lib/python3.10/site-packages/lean_do
jo/data_extraction/traced_data.py", line 966, in _get_repo_and_relative_path
    repo = self.traced_repo.dependencies[name]
KeyError: 'std'

I understand support for older Lean versions has been dropped. Still if possible I would appreciate some insight in case the problem is simple, because many of the machine learning benchmarks such as miniF2F are on older versions of Lean, and if I cannot trace the repos I cannot evaluate on these benchmarks. Thanks!

Platform Information

yangky11 commented 2 days ago

An alternative would be upgrading these repos to the latest version of Lean. Is that feasible?

AG161 commented 2 days ago

That would be possible in most cases but not ideal because it would not be a direct apples-to-apples comparison. For example in the miniF2F repo I linked, mathd_algebra_176 in Test.lean cannot be closed by ring and requires a more complicated proof because the exponent is a real number and not a natural number, in the current version it can be closed just by ring because the functionality of that tactic was expanded. Also I have not tried updating the Lean version of an older version of mathlib4, but it's possible that for a project of that size it would not be trivial - I'm not sure though since I don't have any experience with that.

yangky11 commented 2 days ago

You could edit the installed *.py directly to remove traced_repo.check_sanity().