usail-hkust / LLMTSCS

Official code for article "LLMLight: Large Language Models as Traffic Signal Control Agents".
157 stars 18 forks source link

Missing indicators in result comparation #13

Closed DA21S321D closed 2 months ago

DA21S321D commented 5 months ago

图片 As the results shown in the picture, the upper side is gpt-3.5-turbo which have run with --prompt Commonsense and the lower side is Advanced-MPLight. Thetest_avg_queue_len test_avg_travel_time test_avg_waiting_time is missing compared to Advanced-MPLight. Is that beacuse i am using --prompt Commonsense rather than --prompt Wait Time Forecast ?

DA21S321D commented 5 months ago

and also the gpt-3.5-turbo data upper the picture is dramatically different compared with the data in the paper(i run this on anon_4_4_hangzhou_real.json), or i have run the wrong script ?

Gungnir2099 commented 5 months ago

We don't collect data on test_avg_queue_len, test_avg_travel_time, or test_avg_waiting_time for GPT models. The GPT version we employed is gpt-3.5-turbo-0613, which demonstrates significantly suboptimal performance. Since our publication, there have been multiple updates to the model. But it's a great sign GPT-3.5 works now.

DA21S321D commented 5 months ago

Thanks for your explanation :)

DA21S321D commented 5 months ago

By the way, what's the difference betweentest_avg_queue_len and test_avg_queue_len_over?

Gungnir2099 commented 5 months ago

test_avg_queue_len is the test result after each training epoch, while we only use GPT models for inference without training. So we do not record test_avg_queue_len.