zhangpur / SR-LSTM

States Refinement LSTM
MIT License
111 stars 35 forks source link

The inputs in training and validating isn't the same #1

Closed yanshihao closed 4 years ago

yanshihao commented 4 years ago

Thank you for sharing the code!

There are some questions I don't understand. Could you help me with that ? Thank you very much! To predict the future trajectory of the human, I have read some paper, such as social-lstm. I want to run it and test the code, so I configured the environment as yours and run the program "SR-LSTM" successfully.

But when I train the network(SR-LSTM or vanilla-LSTM), the following results are obtained, like this.

----epoch 62, train_loss=0.00379, valid_error=0.569, valid_final=1.349,test_error=0.000,valid_final=0.000

The train_loss is small, but valid_error is large which no longer declines. Is it overfitting?

I'd read the code and I found that the input in training is not the same as that in validating. When training, the network gets the data from time 0 to time n-1, and the output is from time 1 to time n, as shown in fig. 1.

When validating, the network gets the data from time 0 to time obs(observation), and the output is from time obs+1 to time n, as shown in fig. 2. 微信图片_20191009202442

Maybe, this is why the valid_error is large but the train_loss is small. (I'm not sure because I'm new here, why not train as validating, only get the data from time 0 to time obs, and predict the data from time obs+1 to time n)

zhangpur commented 4 years ago

Hi! Valid_error is calculated in the certain form of the test criterion (ade), which is referred as 'multi-step inference mode' in Figure 4 of the paper. Trainloss is calculated in the single-step model. If you want to observe the valid loss, just copy the code of calculating the train loss to the validation process. `outputs, , _ ,look= self.net.forward(inputs_fw,iftest=False) loss_o=torch.sum(self.criterion(outputs, batch_norm[1:,:,:2]),dim=2) val_loss += torch.sum(loss_o*lossmask)/num`