wiio12 / LEGO-Prover

Code for the paper LEGO-Prover: Neural Theorem Proving with Growing Libraries
MIT License
54 stars 4 forks source link

How to customize initial skill library #2

Open luan-xiaokun opened 8 months ago

luan-xiaokun commented 8 months ago

Hi, thanks for open releasing the code of LEGO-Prover. I notice that the problem vector store is initialized with miniF2F problem statements and is divided into two parts: test and validation. Does this mean if I want to use LEGO-Prover to prove things for my own project, I need to customize the problem vector store accordingly?

I tried replacing the validation set of miniF2F with my own samples and running LEGO-Prover with the miniF2F problem vector store, but the evolver complains that it cannot find the retrieved problem statement (which is a miniF2F problem of course). Can you describe how to customize the initial problem vector store? Thanks!

wiio12 commented 8 months ago

Hi @luan-xiaokun, thank you for your attention to our work! Indeed all the vector stores are pre-initialized and stored in the data/initialize_skills/skill directory. To customize the initial skill library, you may check the code in lego_prover/env/chroma_worker.py.

Here is a simple example:

chroma_worker = ChromaWorker("path/to/save/library", resume=False)  # set resume equals False will initialize a empty library
data = "Your new problem statement"
error, output = chroma_worker.valid_problem_add_text(data)

After the vector store is built, you may replace the ../../data/initialize_skills/skill in line 40 with your own path/to/save/library

luan-xiaokun commented 7 months ago

Hi @wiio12, thanks for your help! I followed your instructions, here is what I did and what I found:

  1. I commented out line 40 in lego_prover/env/chroma_worker.py to ensure that we are building an empty skill library from scratch https://github.com/wiio12/LEGO-Prover/blob/357672c7751cd0c84aff6bf72a3d1bf97614e81d/lego_prover/env/chroma_worker.py#L35-L40
  2. I followed your instruction and changed the script in lego_prover/env/chroma_worker.py to the following, where true_skill.json only contains a trivial fact that True holds.
    if __name__ == "__main__":
    ckpt_path = "my_checkpoints"
    resume = False
    chroma_worker = ChromaWorker(ckpt_path, resume)
    print("Chroma worker is ready.")
    with open("../../my_checkpoints/true_skill.json") as json_file:
        action, data = json.load(json_file)
        assert action == "problem_add_text"
    error, output = chroma_worker.valid_problem_add_text(data)
    print("error:", error)
    print("output:", output)
  3. Next, I ran the script and obtained a new (almost empty) skill library in my_checkpoints.
  4. Finally, I restored the original script, and changed the path on line 40 in lego_prover/env/chroma_worker.py to my checkpoints folder, so that the Chroma worker will use my custom skill library when proving things.

After that, I found that my custom skill library located in my_checkpoints/skill does not have a skills.json file, resulting in an index out of range error in the evolver process (skill_manager.skills is an empty dictionary since I don't have a skills.json file in my_checkpoints/skill) LEGO seems to be working despite the error, but I'm not sure if the Chroma worker is still working. https://github.com/wiio12/LEGO-Prover/blob/357672c7751cd0c84aff6bf72a3d1bf97614e81d/lego_prover/evolver.py#L321-L326

If I copy the skills.json file in data/initialize_skills/skill to my skill library, it will complain that vectordb is not synced with skill.json, so it seems like that to create my own custom skill library, I still need to craft a skills.json file, correct?

wiio12 commented 7 months ago

Hi @luan-xiaokun, what you have done is correct, and there are two ways of making this work:

  1. As you said, you need to craft a skill.json file, with some basic skills. following the format of the existing skills.json file. I am not sure if only doing this will eliminate all the errors since other things in the skill library are not initialized.

  2. A simpler way I can think of is this: since you only want to change the problem to solve with LEGO-Prover, you can only replace the valid_problem_vectordb in data/initialize_skills/skill with valid_problem_vectordb you have crafted in my_checkpoints. This way you won't need to craft the skills.json file.

I might provide a script to craft the initial skill library (hopefully by the end of this week).

luan-xiaokun commented 7 months ago

Thanks for your quick response @wiio12! Looking forward to your skill lib initialization script!