Crl/run whole on dafny - Githubissues

I modify run_whole.py and run.py to work with experiments_clover.py I also run run_whole.py on the clover test set and find that it solves 10 problems while VMCTS solves 12. We would expect these numbers to be the same, but VMCTS gets a few extra tries because it stops early when there is a problem with the code; this might explain why it solves 2 more problems than straight sampling.