Bad evaluation performace on CARLA

Thanks for your fantastic work! I'm trying to reproduce your work by the leading and the checkpoints posted on the repo, but I get bad results. I used the official code and model weights, conducted the evaluation on twon5 long with config like this: run_evaluation.sh:

export LEADERBOARD_ROOT=leaderboard export CHALLENGE_TRACK_CODENAME=SENSORS export PORT=$PT # same as the carla server port export TM_PORT=$(($PT+500)) # port for traffic manager, required when spawning multiple servers/clients export DEBUG_CHALLENGE=0 export REPETITIONS=1 # multiple evaluation runs export ROUTES=langauto/benchmark_long.xml export TEAM_AGENT=leaderboard/team_code/lmdriver_agent.py # agent export TEAM_CONFIG=leaderboard/team_code/lmdriver_config.py # model checkpoint, not required for expert export CHECKPOINT_ENDPOINT=results/lmdrive_recurrent_train_fintune_2024_6_2_result_long.json # results file export SCENARIOS=leaderboard/data/official/all_towns_traffic_scenarios_public.json export SAVE_PATH=data/eval # path for saving episodes while evaluating export RESUME=True

lmdriver_config: `class GlobalConfig: """base architecture configurations"""

# Controller
turn_KP = 1.25
turn_KI = 0.75
turn_KD = 0.3
turn_n = 40  # buffer size

speed_KP = 5.0
speed_KI = 0.5
speed_KD = 1.0
speed_n = 40  # buffer size

max_throttle = 0.75  # upper limit on throttle signal value in dataset
brake_speed = 0.1  # desired speed below which brake is triggered
brake_ratio = 1.1  # ratio of speed to desired speed at which brake is triggered
clip_delta = 0.35  # maximum change in speed input to logitudinal controller

# llm_model = '/data/llava-v1.5-7b'
# preception_model = 'memfuser_baseline_e1d3_return_feature'
# preception_model_ckpt = 'sensor_pretrain.pth.tar.r50'
# lmdrive_ckpt = 'lmdrive_llava.pth'

llm_model = 'checkpoints/llava-v1.5-7b'
preception_model = 'memfuser_baseline_e1d3_return_feature'
preception_model_ckpt = 'checkpoints/vision-encoder-r50.pth.tar'
lmdrive_ckpt = 'checkpoints/llava-v1.5-checkpoint.pth'

agent_use_notice = False
sample_rate = 2

def __init__(self, **kwargs):
    for k, v in kwargs.items():
        setattr(self, k, v)

` Are these configs and parameters right? I just adjust the path to the pth file, did not change anything else.

And the final result: "values": [ "16.305", "23.782", "0.874", "0.000", "0.218", "0.373", "1.000", "0.000", "0.075", "3.122", "0.075", "0.088" ] And I also reproduce the whole training by the offficial leading on 8 A100 80G GPU(nothing different form official code and config), but get even worse results(I have evaluated several times, this is the best one): "values": [ "10.350", "18.069", "0.595", "0.000", "4.003", "9.696", "0.706", "0.000", "5.724", "4.635", "0.047", "6.457" ] Could you please give me some substantial advice or point out my problems? Thank you very much!

opendilab / LMDrive

Bad evaluation performace on CARLA #56