training losses of urbandriver is easy to diverge. is there any training tricks to stabilizing the training process?

shubaozhang commented 2 years ago

Any training tricks to stabilizing the training process of UrbanDriver?

Optimizer params: learning rate, batch_size, .etc.
Since urbandirver is an offline RL method, does any tricks exist to constrain the simulation?

perone commented 2 years ago

Hi @shubaozhang, none of the authors of this work are working at Level-5 anymore, you can try reaching directly to them. With that said, my personal take on it is that it doesn't seem to be an offline RL method for many reasons (i.e. you are not optimizing for expected reward but for an imitation loss, you still have a differentiable loss minimization and simulation, there is no exploration, etc) so it is quite different than what we have in a RL setting or the setting where policy gradient theorem is derived from.

shubaozhang commented 2 years ago

Hi @shubaozhang, none of the authors of this work are working at Level-5 anymore, you can try reaching directly to them. With that said, my personal take on it is that it doesn't seem to be an offline RL method for many reasons (i.e. you are not optimizing for expected reward but for an imitation loss, you still have a differentiable loss minimization and simulation, there is no exploration, etc) so it is quite different than what we have in a RL setting or the setting where policy gradient theorem is derived from.

Thanks for your reply

jeffreywu13579 commented 2 years ago

Hi @shubaozhang, were you ever able to figure out the issues with training? I'm also facing difficulties in getting my trained urbandriver model (specifically the open loop with history) to match the performance of the pretrained models provided.

shubaozhang commented 2 years ago

Hi @shubaozhang, were you ever able to figure out the issues with training? I'm also facing difficulties in getting my trained urbandriver model (specifically the open loop with history) to match the performance of the pretrained models provided.

The parameters: history_num_frames_ego, future_num_frames, etc., affects a lot. I use the following paramets. And the training loss converges.

jeffreywu13579 commented 2 years ago

Hi @shubaozhang, thanks for these configurations! Have you tried evaluating your trained model in closed_loop_test.ipynb and visualizing the scenes? My trained Urban Driver model (with the configs above as well as the default configs) with 150k iterations on train_full.zarr still seem to converge a degenerate solution (such as just driving straight ahead regardless of the map) whereas the pre-trained BPTT.pt provided does not have this issue. Were there any additional changes you had to make (such as to closed_loop_model.py or open_loop_model.py)?

Also, by chance, have you tried loading in the state dict of the pretrained BPTT.pt model (as opposed as directly using the Jit model)? It seems the provided configs do not work when trying to load the state dict. I had to change the d_local from 256 to 128 in open_loop_model.py to get the pretrained state dict to load in to my model and there seems to be other mismatches (the shape of result['positions'] is different between the BPTT.pt Jit model and my model where I load in the state dict of BPTT.pt).

woven-planet / l5kit

training losses of urbandriver is easy to diverge. is there any training tricks to stabilizing the training process? #383