lean-dojo / LeanDojo

Tool for data extraction and interacting with Lean programmatically.
https://leandojo.org
MIT License
536 stars 80 forks source link

Missing files error when tracing mathlib4 #64

Closed vasnesterov closed 11 months ago

vasnesterov commented 11 months ago

Description I've just installed LeanDojo and trying to trace mathlib4 using this script:

import os 
os.environ["VERBOSE"] = "1"

from lean_dojo import LeanGitRepo, trace

repo = LeanGitRepo("https://github.com/leanprover-community/mathlib4", "236a4f542ecbc010962f1041a4427a3038f81f82")
trace(repo, dst_dir="traced_mathlib4")

After tracing it checks if there are some missing files and throws an error:

[3766/3775] Building Mathlib.NumberTheory.ModularForms.JacobiTheta.Manifold
[3767/3775] Building Mathlib.Analysis.SpecialFunctions.Gamma.Beta
[3769/3775] Building Mathlib.NumberTheory.ZetaFunction
[3773/3775] Building Mathlib.AlgebraicGeometry.Morphisms.FiniteType
[3774/3775] Building Mathlib
2023-09-20 20:44:31.875 | INFO     | __main__:main:154 - Tracing mathlib4
 98%|█████████▊| 4396/4504 [1:27:31<52:57, 29.42s/it]2023-09-20 22:13:03.446 | ERROR    | __main__:check_files:112 - Missing /workspace/mathlib4/build/ir/Mathlib/Data/Real/Basic.ast.json
2023-09-20 22:13:03.447 | ERROR    | __main__:check_files:112 - Missing /workspace/mathlib4/build/ir/Mathlib/Algebra/Module/LocalizedModule.dep_paths
2023-09-20 22:13:03.447 | ERROR    | __main__:check_files:112 - Missing /workspace/mathlib4/build/ir/Mathlib/Data/Real/Basic.dep_paths
2023-09-20 22:13:03.447 | ERROR    | __main__:check_files:112 - Missing /workspace/mathlib4/build/ir/Mathlib/Algebra/Module/LocalizedModule.ast.json
2023-09-20 22:13:03.447 | ERROR    | __main__:check_files:112 - Missing /workspace/mathlib4/build/ir/Mathlib/AlgebraicGeometry/Morphisms/RingHomProperties.ast.json
2023-09-20 22:13:03.447 | ERROR    | __main__:check_files:112 - Missing /workspace/mathlib4/build/ir/Mathlib/AlgebraicGeometry/Morphisms/RingHomProperties.dep_paths
Traceback (most recent call last):
  File "/workspace/build_lean4_repo.py", line 165, in <module>
    main()
  File "/workspace/build_lean4_repo.py", line 161, in main
    check_files()
  File "/workspace/build_lean4_repo.py", line 113, in check_files
    raise RuntimeError("Missing intermediate files.")
RuntimeError: Missing intermediate files.
Traceback (most recent call last):
  File "/home/vasa/leandojo/trace_mathlib.py", line 9, in <module>
    trace(repo, dst_dir="traced_mathlib4")
  File "/home/vasa/leandojo/leandojo_env/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 162, in trace
    cached_path = get_traced_repo_path(repo)
  File "/home/vasa/leandojo/leandojo_env/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 134, in get_traced_repo_path
    _trace(repo)
  File "/home/vasa/leandojo/leandojo_env/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 66, in _trace
    _trace_lean4(repo)
  File "/home/vasa/leandojo/leandojo_env/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 104, in _trace_lean4
    container.run(
  File "/home/vasa/leandojo/leandojo_env/lib/python3.10/site-packages/lean_dojo/container.py", line 315, in run
    execute(cmd, capture_output=capture_output)
  File "/home/vasa/leandojo/leandojo_env/lib/python3.10/site-packages/lean_dojo/utils.py", line 117, in execute
    raise ex
  File "/home/vasa/leandojo/leandojo_env/lib/python3.10/site-packages/lean_dojo/utils.py", line 112, in execute
    res = subprocess.run(cmd, shell=True, capture_output=capture_output, check=True)
  File "/home/vasa/.pyenv/versions/3.10.12/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'docker run --cidfile 08zqnmpe.cid --rm -u 1000 --mount type=bind,src="/tmp/tmpfv8xhxk1/mathlib4",target="/workspace/mathlib4" --mount type=bind,src="/home/vasa/leandojo/leandojo_env/lib/python3.10/site-packages/lean_dojo/data_extraction/build_lean4_repo.py",target="/workspace/build_lean4_repo.py" --mount type=bind,src="/home/vasa/leandojo/leandojo_env/lib/python3.10/site-packages/lean_dojo/data_extraction/ExtractData.lean",target="/workspace/mathlib4/ExtractData.lean" --env NUM_PROCS=32 --workdir /workspace yangky11/lean-dojo python3 build_lean4_repo.py mathlib4' returned non-zero exit status 1.

(It's only the end of the log actually, I'll send the full log if you want.)

I did it a couple of times and noticed that files in error go in random order, and some files can appear or not: for example, in previous run there was no LocalizedModule in the error message (the rest were).

Although tracing lean4-example works well.

Detailed Steps to Reproduce the Behavior

  1. Install LeanDojo:
    git clone https://github.com/lean-dojo/LeanDojo
    cd LeanDojo
    pip install .
  2. Run the script above.

Platform Information

yangky11 commented 11 months ago

This could be a result of OOM error. How much memory do you have? Tracing mathlib4 probably needs at least 32GB.

vasnesterov commented 11 months ago

I have 64GB, and checked memory consumption a few times during tracing (but not at the end), it has never been more than 40GB.

yangky11 commented 11 months ago

64GB should be enough. I'll try to reproduce this problem and let you know.

yangky11 commented 11 months ago

Hi @vasnesterov,

I tried to run the code but unfortunately wasn't able to reproduce the problem. Please let me know if this issue persists and is blocking your own project building on top of LeanDojo. I can share the traced repo with you if all you need is this specific version of mathlib4.

vasnesterov commented 11 months ago

Thank you for offering! But the version from demo is actually more than enough for me.

I've run tracing again, setting num_proc = 1, and got the same error. I've monitored the memory consumption, it was about 64GB at some moment, so I think it is really just OOM.

I think the issue can be closed now.