Test time prediction inputs

Seleucia commented 6 years ago

Hello Julieta,

I'm bit confused by the line, at this line we are feeding "encoder_inputs, decoder_inputs, decoder_outputs" to network to make the prediction, and model makes prediction with these data. As i understood from the line decoder_inputs contains data o step behind the decoder_outputs. In this configuraion, model just makes the forcasting one step ahead?. ps: If i correctly understood, at this line model makes the prediction, As i see, it uses both encoder_inputs and decoder_outputs.

YongyiTang92 commented 6 years ago

I think @Seleucia is right that at testing time the decoder_input containing a large portion of ground-truth data is used for prediction.

I also find that in https://github.com/una-dinosauria/human-motion-prediction/blob/1b39a7300192511b7387e8f3f393daf0aca9c0c4/src/translate.py#L225 , when the predictions are the same, the std must be 0 which makes mean_errors become 0. Am I right?

una-dinosauria commented 6 years ago

@YongyiTang92 Regarding the error in dimensions selected, @Seleucia was kind enough to submit a PR that fixes this issue. If this is another problem with the code, could you please open a different issue? Thank you.

una-dinosauria commented 6 years ago

@Seleucia Thanks for finding this. After going through my code again I believe that this is indeed a bug. However, the reason and its impact are a little more subtle than what you mentioned. I'll post a more detailed reply soon.

una-dinosauria commented 6 years ago

Heads up: Very jetlagged me writing this post.

Remember, in the paper we proposed using a "sampling-based" loss, which we found produces (relatively) realistic motion in the long-term and does not require hyper-parameter tuning unlike the noise scheduling used in previous work. The alternative is to use a standard loss, where the ground truth is fed at each time-step.

The way we implement the sampling-based loss is passing a loop function lf to the decoder. I.e. https://github.com/una-dinosauria/human-motion-prediction/blob/c9a2774fe59bb78c6ec6d04e15f2cb7243c4348c/src/seq2seq_model.py#L122-L127

If the loop function is None, then whatever we pass to decoder_inputs is used in the decoder. Otherwise, output of the previous timestep is used as input. Now, this is a problem when we want to use the "supervised" loss, because the ground_truth decoder_inputs are still being used. Therefore, the test error that is reported after each epoch of training is wrong. I believe this is the bug you are reporting, correct?

However, if we are using the "sampling-based" loss, this problem disappears, because the ground truth for the encoder is ignored at both training and test-time. Moreover, if we pass the --sample flag, the correct "sampling-based loss" (e.g. passing previous output as input) is used no matter how the model was trained.

https://github.com/una-dinosauria/human-motion-prediction/blob/c9a2774fe59bb78c6ec6d04e15f2cb7243c4348c/src/translate.py#L546-L547

So if one relies on the --sample flag, the error and the samples are correct. Therefore, the bug only affects the validation error that is printed during training, right? I think this is what I put in the paper and I should definitely correct it :S. Please let me know if what I said makes sense.

YongyiTang92 commented 6 years ago

It makes sense to me. Thanks. It seems that you did not report the validation error, so what should you correct in your paper? It looks fine to me now :)

ajdroid commented 6 years ago

Hi, does this mean that the results using the supervised/unsupervised loss in the paper are inaccurate? Are these numbers gotten by using the ground truth inputs to the decoder?

ajdroid commented 6 years ago

It looks like with --sample being true, we get a 100 frames long sample. Will setting model.target_seq_len to 10 (400ms) with sampling set to True give the right numbers?

una-dinosauria commented 6 years ago

^ The length of sampling in an RNN does not affect the results (other than showing you more samples).

una-dinosauria commented 6 years ago

^^ I believe so. If you wish to compare against this paper, compare against the output using --sample, and against the suprisingly-hard-to-beat zero velocity baseline.

una-dinosauria / human-motion-prediction

Test time prediction inputs #18