Open astariul opened 5 years ago
This is happening because your target_mapping
has size 126
.
Looking at the docs for transformer_xl
, we can see that target_mapping
has the following description:
...
target_mapping: float32 Tensor in shape [num_predict, len, bsz].
If target_mapping[i, j, k] = 1, the i-th predict in batch k is
on the j-th token.
Only used during pretraining for partial prediction.
Set to None during finetuning.
...
Since the size of target_mapping
is (126, 640, 1)
, the size of the output is (126, 1, 1024)
.
Thanks for your input @lukemelas !
However, I still don't understand. From the comments :
target_mapping
is supposed to be [num_predict, len, bsz]
.
In my case it is [126, 640, 1]
, so we have : num_predict = 126
, len = 640
and bsz = 1
get_sequence_output
is supposed to be [len, bsz, d_model]
.
So in my case, it should be [640, 1, 1024]
. But it is [126, 1, 1024]
Is the comment of get_sequence_output
wrong ? The real output size should be [num_predict, bsz, d_model]
and not [len, bsz, d_model]
?
I'm running this code (only relevant part) :
and the output of this code is : (on TPU)
According to the comments of the class XLNetModel, my input's shape are correct. But why my output has this shape ? According to comments of
get_sequence_output()
, the output shape should be [640, 1, 1024], not [126, 1, 1024]Any guidance is welcome :)