Open WindyHu001 opened 2 months ago
You can follow our proposed method laid out in the paper. Follow steps like 1) utilize GPT-4 to interact with the environment and collect its inference trajectories; 2) run the imitation fine-tuning ./finetune/run_imitation_finetune.py
by leveraging the collected inference trajectories of GPT-4; 3) train the critic model within the same environment; 4) run the critic-guided policy refinement ./finetune/run_policy_refinement_data_collection.py
to optimize the LLM's control action.
The prompt design is provided in the paper.
How should I train my personal dataset? I have doubts about the real datasets of the three cities mentioned in the project. What aspects of transportation data do they record? For example, I have some real-world intersection turning data, and in Sumo, I can use the jtcrouter interface to construct traffic flow based on turning data.