Question about PACT tactic dataset

wiio12 commented 2 years ago

Hi, great work here! But I got some questions when reproducing your paper "Formal Mathematics Statement Curriculum Learning". In this paper, you use the tactic dataset in PACT. However, when I follow the instructions from PACT and create the tactic dataset, most of the tactics and proofs can not reach the "no goals" state when directly applying the tactics in the dataset step by step. Only 30% of them are actually applicable and successfully proven. I wonder if I am doing something wrong here or missing some important steps?

spolu commented 2 years ago

Many theorems are a mix of tactic mode and term mode proofs. PACT tactics only extract the tactic mode tactic states and tactic which explains why that would be the case.

Another instance is for blocks where PACT will record both the full block as well as the inner tactic applications which makes direct replay non trivial.

Hope that helps!

wiio12 commented 2 years ago

Thank you for your reply @spolu ! It helps a lot! I also notice there are namespeces problem where lean-gym can not find certain theorems without the namespaces. One quick example is a tactic unfold read read', lean-gym can not find read unless I replace It with buffer.read.

My question is, with so many theorems that can not replay directly, can I training GPT model using PROOFSTEP objective directly with these PACT data. In other words, do I need further post-process after retain the PACT data and make sure every theorem reaches no goals using lean-gym with tactics In the dataset. And after this post-process I feed them into GPT model for training.

fzyzcjy commented 1 year ago

Hi, have you successfully reproduced the paper? Thanks!

wiio12 commented 1 year ago

Hi @fzyzcjy, we failed to reproduce the paper using PACT data only and lean-gym. We only achieve around 23% pass rate @1 with e=8, which exists a large margin from results reported by the paper.

Have you reproduced the paper? what's the result, any advice?

fzyzcjy commented 1 year ago

@wiio12 Hi happy to hear from other reproduction results! I am reproducing paper https://arxiv.org/abs/2205.11491 (since it is latest SOTA). Currently I am working on the "Equations" dummy environment (a env discussed in that paper used as a testbed) before working on real Lean environment. I have not reproduced the results (even on Equations) yet, but will update later if I get some progress.

Btw, how much computation power do you use to get 23%? That paper uses really a ton of GPUs :/

wiio12 commented 1 year ago

@fzyzcjy Hi, we are also interestest in reproducing the HTPS, but for now, we mainly focus on PACT and Thor. We choose Thor since the whole reproducing process is more transparent and everything used for reproduction is open-sourced.

For achieving the pass rate of 23%, we mostly follow the process described in the paper. the main difference (or drawbacks) I personally assume are as follow:

WebMath dataset for pertaining. We don't have access to this dataset, thus we skip the pertaining using the WebMath and use checkpoint from gpt2-large and directly fine-tune the model using PACT and tactic dataset.
Low pass rate for the training set. As I mention in this issue, not all proofs in the training data are direct replayable. only 55% of the tactic data are able to pass lean-gym and achieve no-goals. Personally, I think this is the major reason for the bad performance of the reproduced result.
Other engineering details. Due to limited resources, we only use 8 v100 GPUs to train and evaluate the model, thus many training details could be different from the original paper.

p.s. I am greatly interested in reproducing these results, and if possible, I am welcome to cooperate on these reproductions.

fzyzcjy commented 1 year ago

@wiio12 That looks interesting! I agree those points may be a cause to the low pass rate.

... from gpt2-large

IMHO, GPT is a decoder-only architecture, while things like T5/BART are encoder-decoder architecture. By utilizing an encoder-decoder architecture, the speed can be faster, thus being able to train larger models / more epochs / faster experiments.

if possible, I am welcome to cooperate on these reproductions

I am also welcomed to co-operate on the reproductions if possible!

Some-random commented 1 year ago

Hi @wiio12, I've recently encountered a problem detailed in this issue and I'm currently finding it challenging to resolve. I've noticed your success in reproducing results in this repository and was wondering if you might have insights or solutions regarding this particular issue. Any assistance you could provide would be greatly appreciated. Thank you for your time.

openai / lean-gym

Question about PACT tactic dataset #23