Closed wiio12 closed 2 years ago
Many theorems are a mix of tactic mode and term mode proofs. PACT tactics only extract the tactic mode tactic states and tactic which explains why that would be the case.
Another instance is for blocks where PACT will record both the full block as well as the inner tactic applications which makes direct replay non trivial.
Hope that helps!
Thank you for your reply @spolu ! It helps a lot! I also notice there are namespeces problem where lean-gym can not find certain theorems without the namespaces. One quick example is a tactic unfold read read'
, lean-gym can not find read
unless I replace It with buffer.read
.
My question is, with so many theorems that can not replay directly, can I training GPT model using PROOFSTEP
objective directly with these PACT data. In other words, do I need further post-process after retain the PACT data and make sure every theorem reaches no goals
using lean-gym with tactics In the dataset. And after this post-process I feed them into GPT model for training.
Hi, have you successfully reproduced the paper? Thanks!
Hi @fzyzcjy, we failed to reproduce the paper using PACT data only and lean-gym. We only achieve around 23% pass rate @1 with e=8, which exists a large margin from results reported by the paper.
Have you reproduced the paper? what's the result, any advice?
@wiio12 Hi happy to hear from other reproduction results! I am reproducing paper https://arxiv.org/abs/2205.11491 (since it is latest SOTA). Currently I am working on the "Equations" dummy environment (a env discussed in that paper used as a testbed) before working on real Lean environment. I have not reproduced the results (even on Equations) yet, but will update later if I get some progress.
Btw, how much computation power do you use to get 23%? That paper uses really a ton of GPUs :/
@fzyzcjy Hi, we are also interestest in reproducing the HTPS, but for now, we mainly focus on PACT and Thor. We choose Thor since the whole reproducing process is more transparent and everything used for reproduction is open-sourced.
For achieving the pass rate of 23%, we mostly follow the process described in the paper. the main difference (or drawbacks) I personally assume are as follow:
p.s. I am greatly interested in reproducing these results, and if possible, I am welcome to cooperate on these reproductions.
@wiio12 That looks interesting! I agree those points may be a cause to the low pass rate.
... from gpt2-large
IMHO, GPT is a decoder-only architecture, while things like T5/BART are encoder-decoder architecture. By utilizing an encoder-decoder architecture, the speed can be faster, thus being able to train larger models / more epochs / faster experiments.
if possible, I am welcome to cooperate on these reproductions
I am also welcomed to co-operate on the reproductions if possible!
Hi @wiio12, I've recently encountered a problem detailed in this issue and I'm currently finding it challenging to resolve. I've noticed your success in reproducing results in this repository and was wondering if you might have insights or solutions regarding this particular issue. Any assistance you could provide would be greatly appreciated. Thank you for your time.
Hi, great work here! But I got some questions when reproducing your paper "Formal Mathematics Statement Curriculum Learning". In this paper, you use the tactic dataset in PACT. However, when I follow the instructions from PACT and create the tactic dataset, most of the tactics and proofs can not reach the "no goals" state when directly applying the tactics in the dataset step by step. Only 30% of them are actually applicable and successfully proven. I wonder if I am doing something wrong here or missing some important steps?