iggyray / llms-planning

A benchmark for evaluating large language models in planning
0 stars 0 forks source link

Test plan validation ability of llama3:80b #19

Closed iggyray closed 1 month ago

iggyray commented 1 month ago

Description

The current TOT architecture relies on llama3's ability to correctly determine if plans fulfil the goal state (via the validation prompt). However it seems that it is not able to consistently validate plans even if they are complete.

Therefore, to determine if it is feasible to continue development with the current architecture, it is necessary to test llama3's plan validation ability

Setup

  1. Arrange blocksworld_3 problems in ascending order based on their solution length.
  2. Using the validation prompt's format, feed the valid plan to llama3