iggyray / llms-planning

A benchmark for evaluating large language models in planning
0 stars 0 forks source link

Improve validation prompt #14

Open iggyray opened 1 month ago

iggyray commented 1 month ago

llama3:80b is not able to consistently validate a valid plan.

iggyray commented 1 month ago

2024/5/30 Update

Based on experiment results in ./plan-bench/results/blocksworld_3/compiled_report_no_delimiters_1.json: Organising the prompt with delimiters seems to have little effect on validation accuracy. If anything, a slight decrease in accuracy was observed, particularly for shorter plans.