pieterwolfert / co-speech-humanoids

Co-Speech Gesture Generation for Humanoid Robots
13 stars 3 forks source link

Input sequence and target sequence mismatch during training? #10

Closed wubowen416 closed 4 years ago

wubowen416 commented 4 years ago

Hi, I'm grateful that you guys share this code so that I could learn lots of things.

However there are some parts in the code that confused me, hope you could help me to figure it out.

In function pad_and_sort_batch of file seq2seq.py, I assume that, at first, both a batch of input and a batch of target sequence is padded, then they are split based on n and m parameters in the paper in order for creating a new batched data.

The point is that, as the sequence is padded, there would be many zeros both in input seq and target seq, so the split process will occasionally create sequences that are all zeros. As a result, some training batches contain input and target that are all zeros. I noticed that there is a line saying that if the input tensor is all zero, the length will be set to be 1. This is fine but at this part of code (line 215 and 216),

if len(input_var) == window_size:
    mini_batches.append([input_var, input_lengths, target_var, len(y)])

due to the way of splitting, len(y) will be a constant, even when the sequence is all zero. Therefore, during training, sometimes the network is trained to generate a whole zero sequence with zero input, speeding down the training phase. In my view, this should be avoided.

Additionally, a more crucial problems maybe happen due to this.

Because the length of input and target are not matching (since they are text and motion), during splitting, it is not impossible that zero input will be matched with non-zero target and vice versa, which means the training data is not correct!

If I was wrong, please let me know where I made mistakes!

Svito-zar commented 4 years ago

@wubowen416 , did you figure out was was the issue?