namin / llm-verified-with-monte-carlo-tree-search

LLM verified with Monte Carlo Tree Search
https://arxiv.org/abs/2402.08147
MIT License
210 stars 25 forks source link

Crl/run whole on dafny #42

Closed ChloeL19 closed 4 months ago

ChloeL19 commented 4 months ago

I modify run_whole.py and run.py to work with experiments_clover.py I also run run_whole.py on the clover test set and find that it solves 10 problems while VMCTS solves 12. We would expect these numbers to be the same, but VMCTS gets a few extra tries because it stops early when there is a problem with the code; this might explain why it solves 2 more problems than straight sampling.