Data extraction in Lean 4 succeeds for the latest version of Lean.
New bug observed during tracing
To reproduce, run
from lean_dojo import *
repo = LeanGitRepo("https://github.com/leanprover/std4", "7969a8724ba0c31716c56e551c51a048a8537725")
trace(repo)
You can expect to see building succeeds, data extraction succeeds and then tracing fails with an assertion error.
Full log on my end:
/Users/psong/anaconda3/envs/leandojo/bin/python test.py
2024-02-09 00:05:18.258 | DEBUG | lean_dojo.constants:<module>:72 - Using GitHub without authentication. Don't be surprised if you hit the API rate limit.
2024-02-09 00:05:20.997 | DEBUG | lean_dojo.data_extraction.lean:_to_commit_hash:49 - Querying the commit hash for lean4 v4.6.0-rc1
2024-02-09 00:05:24.996 | INFO | lean_dojo.data_extraction.trace:get_traced_repo_path:157 - Tracing LeanGitRepo(url='https://github.com/leanprover/std4', co
mmit='7969a8724ba0c31716c56e551c51a048a8537725') 2024-02-09 00:05:24.997 | DEBUG | lean_dojo.data_extraction.trace:get_traced_repo_path:159 - Working in the temporary directory /Users/psong/tmppp/tmp2wblci3
9 2024-02-09 00:05:25.664 | DEBUG | lean_dojo.data_extraction.lean:clone_and_checkout:492 - Cloning LeanGitRepo(url='https://github.com/leanprover/std4', commi
t='7969a8724ba0c31716c56e551c51a048a8537725') 2024-02-09 00:05:32.627 | DEBUG | lean_dojo.data_extraction.trace:_trace_lean4:111 - Tracing LeanGitRepo(url='https://github.com/leanprover/std4', commit='79
69a8724ba0c31716c56e551c51a048a8537725') 2024-02-09 00:05:32.702 | DEBUG | lean_dojo.container:run:184 - NUM_PROCS=10 python build_lean4_repo.py std4
2024-02-09 00:05:32.761 | INFO | __main__:main:165 - Building std4
[0/6] Building Std.Classes.BEq
[0/6] Building Std.Util.LibraryNote
[0/13] Building Std.Lean.Delaborator
[0/16] Building Std.Tactic.Unreachable
[0/16] Building Std.Lean.Command
[0/16] Building Std.Lean.TagAttribute
[0/16] Building Std.Classes.Dvd
[0/16] Building Std.Tactic.OpenPrivate
[0/65] Building Std.Tactic.ByCases
[0/74] Building Std.Util.TermUnsafe
[1/263] Building Std.Tactic.SeqFocus
[2/263] Building Std.Lean.Name
[3/263] Building Std.Lean.Position
[4/263] Building Std.Data.Json
[5/263] Building Std.Lean.Syntax
[6/263] Building Std.Lean.Parser
[7/263] Building Std.Lean.LocalContext
[9/263] Building Std.Util.ProofWanted
[9/263] Building Std.Lean.Tactic
[10/263] Building Std.Lean.InfoTree
[11/263] Building Std.Data.Array.Init.Basic
[12/263] Building Std.Data.Ord
[13/263] Building Std.Tactic.RCases
[14/263] Building Std.Util.ExtendedBinder
[15/263] Compiling Std.Lean.TagAttribute
[16/263] Compiling Std.Lean.Name
[17/263] Compiling Std.Util.LibraryNote
[18/263] Compiling Std.Tactic.Unreachable
[19/263] Compiling Std.Lean.Position
[20/263] Compiling Std.Lean.Syntax
[21/263] Compiling Std.Lean.Parser
[22/263] Compiling Std.Lean.LocalContext
[23/263] Compiling Std.Lean.Tactic
[24/263] Compiling Std.Lean.InfoTree
[25/263] Compiling Std.Lean.Command
[26/263] Compiling Std.Util.TermUnsafe
[27/263] Compiling Std.Data.Ord
[28/263] Compiling Std.Util.ProofWanted
[29/263] Compiling Std.Data.Array.Init.Basic
[30/263] Compiling Std.Tactic.SeqFocus
[31/263] Compiling Std.Tactic.ByCases
[32/263] Compiling Std.Tactic.OpenPrivate
[33/263] Building Std.Tactic.Change
[34/263] Building Std.Lean.Meta.Expr
[35/263] Building Std.Lean.PersistentHashMap
[36/263] Building Std.Control.ForInStep.Basic
[37/263] Building Std.Data.MLList.Basic
[38/263] Building Std.Data.List.Init.Basic
[39/263] Building Std.Tactic.HaveI
[40/263] Building Std.Data.Fin.Init.Lemmas
[42/263] Building Std.Tactic.LeftRight
[42/263] Building Std.Data.Option.Init.Lemmas
[43/263] Building Std.Tactic.Replace
[44/263] Building Std.Tactic.Omega.Config
[45/263] Building Std.Lean.Expr
[46/263] Building Std.Lean.Elab.Tactic
[47/263] Building Std.Data.DList
[48/263] Building Std.Lean.CoreM
[49/263] Building Std.Lean.Except
[50/263] Building Std.Lean.Float
[51/263] Building Std.Lean.HashMap
[52/263] Building Std.Lean.IO.Process
[53/263] Building Std.Lean.Meta.Inaccessible
[54/263] Building Std.Lean.MonadBacktrack
[55/263] Building Std.Lean.NameMap
[56/263] Building Std.Lean.NameMapAttribute
[57/263] Building Std.Lean.PersistentHashSet
[58/263] Building Std.Lean.SMap
[59/263] Building Std.Lean.Util.Path
[60/263] Building Std.Tactic.Case
[61/263] Building Std.Tactic.Classical
[63/263] Building Std.Tactic.Instances
[63/263] Building Std.Tactic.Exact
[64/263] Building Std.Tactic.LabelAttr
[65/263] Building Std.Tactic.Where
[66/263] Building Std.WF
[67/263] Compiling Std.Data.Json
[68/263] Compiling Std.Data.List.Init.Basic
[69/263] Compiling Std.Lean.Util.Path
[70/263] Compiling Std.Util.ExtendedBinder
[71/263] Building Std.Lean.AttributeExtra
[72/263] Building Std.Data.Int.Basic
[73/263] Building Std.Data.Nat.Basic
[74/263] Building Std.Tactic.PrintDependents
[75/263] Building Std.Tactic.CoeExt
[76/263] Building Std.Lean.Meta.Basic
[77/263] Building Std.Tactic.RunCmd
[78/263] Building Std.Util.Pickle
[79/263] Building Std.Tactic.Lint.Basic
[80/263] Building Std.CodeAction.Attr
[81/263] Building Std.Tactic.GuardExpr
[82/263] Compiling Std.Lean.AttributeExtra
[83/263] Building Std.Lean.Meta.Simp
[84/263] Building Std.Lean.Format
[85/263] Building Std.Tactic.NoMatch
[86/263] Building Std.Linter.UnreachableTactic
[87/263] Compiling Std.Tactic.Lint.Basic
[88/263] Compiling Std.Lean.Format
[89/263] Building Std.Control.ForInStep.Lemmas
[90/263] Building Std.Tactic.Relation.Rfl
[91/263] Building Std.Lean.Json
[92/263] Building Std.Classes.SetNotation
[93/263] Compiling Std.CodeAction.Attr
[94/263] Compiling Std.Linter.UnreachableTactic
[95/263] Building Std.Test.Internal.DummyLabelAttr
[96/263] Building Std.Tactic.NormCast.Ext
[97/263] Building Std.Tactic.Lint.Frontend
[98/263] Building Std.Linter.UnnecessarySeqFocus
[99/263] Compiling Std.Tactic.NoMatch
[100/263] Building Std.Data.MLList.Heartbeats
[101/263] Building Std.Classes.Cast
[102/263] Compiling Std.Lean.Meta.Basic
[103/263] Compiling Std.Tactic.GuardExpr
[104/263] Building Std.Tactic.Lint.TypeClass
[105/263] Building Std.Tactic.Lint.Misc
[106/263] Building Std.Control.ForInStep
[107/263] Building Std.CodeAction.Misc
[108/263] Building Std.CodeAction.Basic
[109/263] Building Std.Tactic.Lint.Simp
[110/263] Compiling Std.Linter.UnnecessarySeqFocus
[111/263] Compiling Std.Tactic.Lint.TypeClass
[112/263] Compiling Std.Classes.SetNotation
[113/263] Building Std.Tactic.TryThis
[114/263] Building Std.Tactic.Relation.Symm
[115/263] Building Std.Lean.Meta.InstantiateMVars
[116/263] Building Std.Lean.Meta.Clear
[117/263] Building Std.Lean.Meta.AssertHypotheses
[118/263] Building Std.Tactic.FalseOrByContra
[119/263] Compiling Std.Tactic.Lint.Frontend
[120/263] Compiling Std.Tactic.Lint.Simp
[121/263] Building Std.Tactic.NormCast.Lemmas
[122/263] Compiling Std.Tactic.Lint.Misc
[123/263] Building Std.Lean.Meta.SavedState
[124/263] Compiling Std.CodeAction.Basic
[125/263] Building Std.Linter
[126/263] Building Std.Lean.HashSet
[127/263] Compiling Std.Linter
[128/263] Building Std.Tactic.NormCast
[129/263] Building Std.Data.Option.Basic
[130/263] Building Std.Lean.Util.EnvSearch
[131/263] Building Std.Control.Nondet.Basic
[132/263] Building Std.CodeAction
[133/263] Building Std.Tactic.GuardMsgs
[134/263] Building Std.CodeAction.Deprecated
[135/263] Compiling Std.Tactic.TryThis
[136/263] Building Std.Data.Prod.Lex
[137/263] Compiling Std.CodeAction.Deprecated
[138/263] Building Std.Tactic.Lint
[139/263] Building Std.Tactic.PrintPrefix
[140/263] Building Std.Tactic.Simpa
[141/263] Building Std.Tactic.SimpTrace
[142/263] Building Std.Tactic.ShowTerm
[143/263] Compiling Std.Tactic.Lint
[144/263] Building Std.Tactic.Alias
[145/263] Building Std.Data.Prod
[147/263] Compiling Std.Tactic.ShowTerm
[153/263] Compiling Std.Tactic.Alias
[153/263] Building Std.Data.Bool
[154/263] Building Std.Tactic.SqueezeScope
[154/263] Compiling Std.Tactic.SimpTrace
[154/263] Building Std.Tactic.Basic
[155/263] Building Std.Classes.Order
[161/263] Compiling Std.Tactic.Basic
[161/263] Building Std.Logic
[164/263] Compiling Std.Logic
[164/263] Building Std.Classes.LawfulMonad
[164/263] Building Std.Tactic.Omega.Logic
[164/263] Building Std.Data.Sum.Basic
[164/263] Building Std.Data.Nat.Init.Lemmas
[164/263] Building Std.Data.Fin.Iterate
[164/263] Building Std.Data.RBMap.Basic
[164/263] Building Std.Data.PairingHeap
[167/263] Compiling Std.Data.Nat.Init.Lemmas
[167/263] Building Std.Data.List.Init.Lemmas
[167/263] Building Std.Data.Nat.Lemmas
[167/263] Building Std.Lean.Meta.LazyDiscrTree
[167/263] Building Std.Data.Fin.Basic
[175/263] Building Std.Data.RBMap.WF
[176/263] Building Std.Data.List.Init.Attach
[176/263] Compiling Std.Data.List.Init.Lemmas
[176/263] Building Std.Data.Array.Init.Lemmas
[178/263] Compiling Std.Data.List.Init.Attach
[178/263] Building Std.Data.Array.Basic
[180/263] Compiling Std.Data.Array.Basic
[180/263] Building runLinter
[181/263] Building Std.Data.Nat.Bitwise
[181/263] Building Std.Data.Nat.Gcd
[181/263] Building Std.Data.BinomialHeap.Basic
[181/263] Building Std.Data.Int.Lemmas
[181/263] Building Std.Data.Array.Merge
[183/263] Compiling runLinter
[184/263] Linking runLinter
[186/263] Building Std.Data.BitVec.Basic
[186/263] Building Std.Data.Nat
[187/263] Building Std.Data.Array.Match
[187/263] Building Std.Data.List.Basic
[189/263] Building Std.Lean.Meta.DiscrTree
[190/263] Building Std.Data.Int.Order
[191/263] Building Std.Data.String.Basic
[192/263] Building Std.Tactic.Ext.Attr
[192/263] Building Std.Util.Cache
[194/263] Building Std.Lean.Meta.UnusedNames
[196/263] Building Std.Data.UInt
[196/263] Building Std.Data.Char
[196/263] Building Std.Tactic.Ext
[196/263] Building Std.Data.Option.Lemmas
[200/263] Building Std.Data.Option
[202/263] Building Std.Tactic.Omega.Int
[202/263] Building Std.Tactic.Omega.MinNatAbs
[202/263] Building Std.Data.Int.DivMod
[203/263] Building Std.Data.Sum.Lemmas
[203/263] Building Std.Control.Lemmas
[203/263] Building Std.Tactic.Congr
[203/263] Building Std.Data.Fin.Lemmas
[208/263] Building Std.Data.BinomialHeap.Lemmas
[209/263] Building Std.Data.Sum
[211/263] Building Std.Data.BinomialHeap
[213/263] Building Std.Data.Fin
[215/263] Building Std.Data.RBMap.Alter
[216/263] Building Std.Tactic.PermuteGoals
[216/263] Building Std.Data.AssocList
[216/263] Building Std.Tactic.SolveByElim.Backtrack
[216/263] Building Std.Data.List.Lemmas
[217/263] Building Std.Data.Rat.Basic
[217/263] Building Std.Tactic.Omega.IntList
[217/263] Building Std.Data.Int.Gcd
[218/263] Building Std.Data.Int
[221/263] Building Std.Data.Rat.Lemmas
[221/263] Building Std.Classes.RatCast
[222/263] Building Std.Data.HashMap.Basic
[224/263] Building Std.Tactic.SolveByElim
[225/263] Building Std.Tactic.Omega.Coeffs.IntList
[226/263] Building Std.Tactic.Omega.LinearCombo
[226/263] Building Std.Tactic.Omega.Constraint
[227/263] Building Std.Data.HashMap.Lemmas
[229/263] Building Std.Data.Rat
[230/263] Building Std.Tactic.Omega.OmegaM
[231/263] Building Std.Tactic.LibrarySearch
[235/263] Building Std.Tactic.Omega.Core
[237/263] Building Std.Tactic.Omega.Frontend
[238/263] Building Std.Lean.System.IO
[238/263] Building Std.Data.List.Count
[238/263] Building Std.Data.Range.Lemmas
[238/263] Building Std.Data.RBMap.Lemmas
[238/263] Building Std.Data.String.Lemmas
[238/263] Building Std.Data.Array.Lemmas
[239/263] Building Std.Data.MLList.IO
[240/263] Building Std.Data.MLList
[241/263] Building Std.Data.List.Pairwise
[242/263] Building Std.Data.Range
[245/263] Building Std.Data.List.Perm
[246/263] Building Std.Tactic.Omega
[247/263] Building Std.Data.RBMap
[248/263] Building Std.Data.BitVec.Lemmas
[250/263] Building Std.Data.List
[251/263] Building Std.Data.Array
[251/263] Building Std.Data.HashMap.WF
[251/263] Building Std.Data.ByteArray
[254/263] Building Std.Data.BitVec.Folds
[255/263] Building Std.Data.BitVec.Bitblast
[256/263] Building Std.Data.String
[258/263] Building Std.Data.BitVec
[261/263] Building Std.Data.HashMap
[262/263] Building Std
2024-02-09 00:06:06.821 | INFO | __main__:main:188 - Tracing std4
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌| 948/951 [09:25<00:01, 2.76it/s]
2024-02-09 00:15:37.191 | WARNING | __main__:check_files:132 - Missing /Users/psong/tmppp/tmp2wblci39/workspace/std4/.lake/build/ir/runLinter.dep_paths 2024-02-09 00:15:37.191 | WARNING | __main__:check_files:132 - Missing /Users/psong/tmppp/tmp2wblci39/workspace/std4/.lake/build/ir/runLinter.ast.json
/Users/psong/anaconda3/envs/leandojo/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore
objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
2024-02-09 00:15:37.355 | DEBUG | lean_dojo.data_extraction.traced_data:from_traced_files:1418 - Parsing 950 *.ast.json files in /Users/psong/tmppp/tmp2wblci
39/std4 with 9 workers 2024-02-09 00:15:39,385 INFO worker.py:1715 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
0%| | 0/950 [00:00<?, ?it/s]
(pid=40457) 2024-02-09 00:15:40.405 | DEBUG | lean_dojo.constants:<module>:72 - Using GitHub without authentication. Don't be surprised if you hit the API rate limit. 1%|█▍ | 11/950 [00:12<18:09, 1.16s/it]
(pid=40460) 2024-02-09 00:15:40.413 | DEBUG | lean_dojo.constants:<module>:72 - Using GitHub without authentication. Don't be surprised if you hit the API ra
te limit. [repeated 8x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.) Traceback (most recent call last):
File "/Users/psong/LeanDojo/test.py", line 4, in <module>
trace(repo)
File "/Users/psong/anaconda3/envs/leandojo/lib/python3.9/site-packages/lean_dojo/data_extraction/trace.py", line 193, in trace
cached_path = get_traced_repo_path(repo, build_deps)
File "/Users/psong/anaconda3/envs/leandojo/lib/python3.9/site-packages/lean_dojo/data_extraction/trace.py", line 161, in get_traced_repo_path
traced_repo = TracedRepo.from_traced_files(tmp_dir / repo.name, build_deps)
File "/Users/psong/anaconda3/envs/leandojo/lib/python3.9/site-packages/lean_dojo/data_extraction/traced_data.py", line 1429, in from_traced_files
traced_files = list(
File "/Users/psong/anaconda3/envs/leandojo/lib/python3.9/site-packages/tqdm/std.py", line 1182, in __iter__
for obj in iterable:
File "/Users/psong/anaconda3/envs/leandojo/lib/python3.9/site-packages/ray/util/actor_pool.py", line 170, in get_generator
yield self.get_next_unordered()
File "/Users/psong/anaconda3/envs/leandojo/lib/python3.9/site-packages/ray/util/actor_pool.py", line 370, in get_next_unordered
return ray.get(future)
File "/Users/psong/anaconda3/envs/leandojo/lib/python3.9/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
File "/Users/psong/anaconda3/envs/leandojo/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "/Users/psong/anaconda3/envs/leandojo/lib/python3.9/site-packages/ray/_private/worker.py", line 2624, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AssertionError): ray::_TracedRepoHelper.parse_traced_file() (pid=40455, ip=127.0.0.1, actor_id=bdfa892c4b86df28200160de01000000, rep
r=<lean_dojo.data_extraction.traced_data._TracedRepoHelper object at 0x10d74f2b0>) File "/Users/psong/anaconda3/envs/leandojo/lib/python3.9/site-packages/lean_dojo/data_extraction/traced_data.py", line 1284, in parse_traced_file
return TracedFile.from_traced_file(self.root_dir, path, self.repo)
File "/Users/psong/anaconda3/envs/leandojo/lib/python3.9/site-packages/lean_dojo/data_extraction/traced_data.py", line 655, in from_traced_file
return cls._from_lean4_traced_file(root_dir, json_path, repo)
File "/Users/psong/anaconda3/envs/leandojo/lib/python3.9/site-packages/lean_dojo/data_extraction/traced_data.py", line 705, in _from_lean4_traced_file
ast = FileNode4.from_data(data, lean_file)
File "/Users/psong/anaconda3/envs/leandojo/lib/python3.9/site-packages/lean_dojo/data_extraction/ast/lean4/node.py", line 251, in from_data
node = Node4.from_data(node_data, lean_file)
File "/Users/psong/anaconda3/envs/leandojo/lib/python3.9/site-packages/lean_dojo/data_extraction/ast/lean4/node.py", line 27, in from_data
return subcls.from_data(node_data, lean_file)
File "/Users/psong/anaconda3/envs/leandojo/lib/python3.9/site-packages/lean_dojo/data_extraction/ast/lean4/node.py", line 423, in from_data
children = _parse_children(node_data, lean_file)
File "/Users/psong/anaconda3/envs/leandojo/lib/python3.9/site-packages/lean_dojo/data_extraction/ast/lean4/node.py", line 265, in _parse_children
node = Node4.from_data(d["node"], lean_file)
File "/Users/psong/anaconda3/envs/leandojo/lib/python3.9/site-packages/lean_dojo/data_extraction/ast/lean4/node.py", line 27, in from_data
return subcls.from_data(node_data, lean_file)
File "/Users/psong/anaconda3/envs/leandojo/lib/python3.9/site-packages/lean_dojo/data_extraction/ast/lean4/node.py", line 1069, in from_data
assert isinstance(decl_val_node.children[2], NullNode4)
AssertionError
Bump & Fix
Fixed
ExtractData.lean
New bug observed during tracing
To reproduce, run
You can expect to see building succeeds, data extraction succeeds and then tracing fails with an assertion error.
Full log on my end:
FYI.