The current TOT architecture relies on llama3's ability to correctly determine if plans fulfil the goal state (via the validation prompt). However it seems that it is not able to consistently validate plans even if they are complete.
Therefore, to determine if it is feasible to continue development with the current architecture, it is necessary to test llama3's plan validation ability
Setup
Arrange blocksworld_3 problems in ascending order based on their solution length.
Using the validation prompt's format, feed the valid plan to llama3
Description
The current TOT architecture relies on llama3's ability to correctly determine if plans fulfil the goal state (via the validation prompt). However it seems that it is not able to consistently validate plans even if they are complete.
Therefore, to determine if it is feasible to continue development with the current architecture, it is necessary to test llama3's plan validation ability
Setup