Closed mh0797 closed 2 years ago
Hi @mh0797, apologies for the delayed response, and thank you for the investigation. We are looking into this and making sure empty agent features are being handled correctly.
Hi @mh0797, sorry for late. Could you use latest version to try again? Could you give us some environment information? Thanks
The error occured using the current devkit-0.6 version. I recreated the environment after updating to this version.
I met the same error.
Hi @mh0797,
Are you still facing the same issue since the v1.0 release?
Hi @patk-motional, sorry for taking so long - I had to set up a fresh environment, cache a new dataset and run a training which is very time-consuming. I was able to train the model for an entire epoch without error on the nuplan-mini dataset. So, I guess we can close this issue. For anybody interested, here is the training command:
python ~/nuplan/nuplan-devkit/nuplan/planning/script/run_training.py \
+training=training_vector_model \
py_func=train \
cache.cache_path=/path/to/cache \
data_loader.params.batch_size=2 \
lightning.trainer.params.max_epochs=10 \
optimizer.lr=5e-5 \
experiment_name=vector_model \
lightning.trainer.params.max_time=null \
scenario_filter.remove_invalid_goals=true \
Describe the bug
Training of the vector model crashes. I think this is the same error as in #82. However, caching the features prior to training does not solve the issue anymore. Additionally, this issue in torch 1.9.0 causes the error to not be reported as a shape mismatch for the linear layer, but as a shape mismatch for the gradient. The root cause of this (as stated in #82) is probably the validity condition of the agents feature resulting in an invalid feature if there are no agents in the scene.
Caching results in:
Completed dataset caching! Failed features and targets: 41 out of 1533645
.Setup
Steps To Reproduce
Steps to reproduce the behavior:
Stack Trace