zorazrw / trove

[ICML'24] TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks
https://arxiv.org/pdf/2401.12869.pdf
Creative Commons Attribution Share Alike 4.0 International
22 stars 4 forks source link
programmatic-tasks tool-making

TROVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks :hammer_and_wrench:

Setup

Install the required packages:

pip install -r requirements.txt

Tasks and datasets are organized as follows:

├── MATH
│   ├── algebra
│   ├── counting_and_probability
│   ├── geometry
│   ├── intermediate_algebra
│   ├── number_theory
│   ├── prealgebra
│   └── precalculus
├── TableQA
│   ├── TabMWP
│   ├── WTQ
│   └── HiTab
├── VQA
└── └── GQA

Running Experiments

Our Method: TroVE

python run_trove.py --task_name "math/algebra"

Note that the specified --task_name argument should be lowercased.

Baseline Methods: Primitive & Instance

python baseline.py --task_name "math/algebra" --suffix "primitive"  # or "instance"

Note that for GQA dataset, we implement the locate_objects and visual_qa functions as fast apis. So you need to launch the server first (as below), then run the trove/baseline experiments.

uvicorn server.gqa:app

Evaluation

python -m utils.eval --results_path ${RESULTS_PATH}