iggyray / llms-planning

A benchmark for evaluating large language models in planning
0 stars 0 forks source link

Vote based on updated plan rather than next step #23

Open iggyray opened 4 weeks ago