Closed DA21S321D closed 2 months ago
and also the gpt-3.5-turbo data upper the picture is dramatically different compared with the data in the paper(i run this on anon_4_4_hangzhou_real.json), or i have run the wrong script ?
We don't collect data on test_avg_queue_len
, test_avg_travel_time
, or test_avg_waiting_time
for GPT models. The GPT version we employed is gpt-3.5-turbo-0613
, which demonstrates significantly suboptimal performance. Since our publication, there have been multiple updates to the model. But it's a great sign GPT-3.5 works now.
Thanks for your explanation :)
By the way, what's the difference betweentest_avg_queue_len
and test_avg_queue_len_over
?
test_avg_queue_len
is the test result after each training epoch, while we only use GPT models for inference without training. So we do not record test_avg_queue_len
.
As the results shown in the picture, the upper side is gpt-3.5-turbo which have run with
--prompt Commonsense
and the lower side isAdvanced-MPLight
. Thetest_avg_queue_len
test_avg_travel_time
test_avg_waiting_time
is missing compared to Advanced-MPLight. Is that beacuse i am using--prompt Commonsense
rather than--prompt Wait Time Forecast
?