Test plan validation ability of llama3:80b

Description

The current TOT architecture relies on llama3's ability to correctly determine if plans fulfil the goal state (via the validation prompt). However it seems that it is not able to consistently validate plans even if they are complete.

Therefore, to determine if it is feasible to continue development with the current architecture, it is necessary to test llama3's plan validation ability

Setup

Arrange blocksworld_3 problems in ascending order based on their solution length.
Using the validation prompt's format, feed the valid plan to llama3

iggyray / llms-planning

Test plan validation ability of llama3:80b #19

Description

Setup