usail-hkust / LLMTSCS

Official code for article "LLMLight: Large Language Models as Traffic Signal Control Agents".
139 stars 12 forks source link

Missing augments in aft_rank_loss_utils.py #17

Open DA21S321D opened 2 months ago

DA21S321D commented 2 months ago

图片 The neg_detach and boundary are not preset, how should i set them True or False ?

Gungnir2099 commented 2 months ago

Sorry for the confusion. We do not use neg_detach but use boundary. I have fixed this bug. You can pull it and try to run the code again.

DA21S321D commented 2 months ago

图片 Yesterday, I set them both True to run. I used Step 2 FineTuned Llama-2-7B (finetuned and merged) in Step 2(refered in the picture) to generate the Policy Refinement Data. Then i use ./finetune/run_policy_refinement.py and merged Lora. But after finishing the Step 3, i run the Step 3 FineTuned Llama-2-7B using run_open_LLM.py to compare with other models, it perform far worse than the Step 2 FineTuned Llama-2-7B . Will these two augments be the problems affecting that ?

Gungnir2099 commented 2 months ago

I think you should follow the settings in my code. You may just pull the project and run it again with llama-2-13b-chat-hf.

DA21S321D commented 2 months ago

I found another missing augment in run_policy_refinement.py at line 112, the "REWARD": in_args.reward is not preset. It seems this augment has not been used in the code, just found it by the way.

Gungnir2099 commented 2 months ago

Thank you for your feedback. This augment is not used in the policy refinement procedure. I have fixed this bug.