umautobots / bidirection-trajectory-predicter

The code for Bi-directional Trajectory Prediction (BiTraP).
Other
78 stars 23 forks source link

Errors when Training on NuScenes #14

Closed tcheung99 closed 2 years ago

tcheung99 commented 2 years ago

Hi @MoonBlvd, I am running into some problems when training on NuScenes, with training breaking on this loc: https://github.com/umautobots/bidireaction-trajectory-prediction/blob/296a50126cd50a1d4a0395696a0567575c4d4df8/bitrap/modeling/bitrap_gmm.py#L450

I noticed that rsample() is called with no arguments, meaning that the component_cat_samples in the following loc ends up sampling the distribution with the default sample shape of sample_shape=torch.Size().

https://github.com/umautobots/bidireaction-trajectory-prediction/blob/296a50126cd50a1d4a0395696a0567575c4d4df8/bitrap/modeling/gmm2d.py#L105

I think that this may be the cause of the errors I am running into:

Screen Shot 2022-01-22 at 3 04 10 PM

Would you have suggestions on what I may be doing incorrectly? Thanks in advance!

tcheung99 commented 2 years ago

We figured out the issue, thanks!

MoonBlvd commented 2 years ago

Sounds good, thanks @tcheung99 ! Do you mind posting what was that the cause and how did you solve it?

tcheung99 commented 2 years ago

For sure! We found that the distribution that the code was sampling from had some NaNs, and after tracing back we realized that the dataloader did not include our NuScenes config in a standardizing (?) step https://github.com/umautobots/bidireaction-trajectory-prediction/blob/296a50126cd50a1d4a0395696a0567575c4d4df8/bitrap/engine/trainer.py#L36-L40

and similarly in lines 93-97, 157-161, 256-260.