Closed vieting closed 2 years ago
I don't remember the details anymore. limit.is_defined
makes only sense for static dims, right? And if we have limit.returnn_naming_entry.is_size_value
, we don't need that (we don't need limit = limit.numpy()
).
I'm not sure if we have limit.returnn_naming_entry.is_size_value
here though. Effectively, limit
is also not the size itself but the max of the seq lengths. Which should be the same though. We probably need some extra logic to set this correctly then. E.g. when you do torch.max(lengths)
.
I don't remember the details anymore. limit.is_defined makes only sense for static dims, right? And if we have limit.returnn_naming_entry.is_size_value, we don't need that (we don't need limit = limit.numpy()).
Yes, I agree.
I'm not sure if we have limit.returnn_naming_entry.is_size_value here though. Effectively, limit is also not the size itself but the max of the seq lengths. Which should be the same though. We probably need some extra logic to set this correctly then. E.g. when you do torch.max(lengths).
Right, it's None
here. I'll look into the torch.max
.
I'm not sure if we have
limit.returnn_naming_entry.is_size_value
here though. Effectively,limit
is also not the size itself but the max of the seq lengths. Which should be the same though. We probably need some extra logic to set this correctly then. E.g. when you dotorch.max(lengths)
.
So for Reduce
with mode="max"
, we would need to check if we can already know what that is?
I'm not sure if we have
limit.returnn_naming_entry.is_size_value
here though. Effectively,limit
is also not the size itself but the max of the seq lengths. Which should be the same though. We probably need some extra logic to set this correctly then. E.g. when you dotorch.max(lengths)
.So for
Reduce
withmode="max"
, we would need to check if we can already know what that is?
What fails now with your change is e.g. test_arange_from_lengths
. In there, we have this code:
batch_dim, time_dim = inputs.shape
lengths = torch.full((batch_dim,), time_dim)
arange = torch.arange(torch.max(lengths))
In practice, it is a question where lengths
comes from. I think so far we never really attached the sequence lengths to the original tensor in any way, or did we?
In this synthetic unrealistic example, we could infer that torch.max(lengths)
refers to time_dim
which has the correct is_size_value
. Is the same actually, we maybe can even return the same tensor object.
We should fix this example such that the tests pass. Which probably should not be too difficult. Or otherwise the tests somehow needs to be rewritten.
The more relevant question however is: How will it look in practice, i.e. where do the lengths
come from, and how are they connected to inputs
?
Maybe you could just also set is_size_value
of lengths
here in this example (and in general, however you would get lengths
). This is not just a shape dim (scalar) but the actual lengths but we could treat it in the same way. That would then be easy to detect in torch.max
. Not sure if this is a good solution or whether this causes other problems elsewhere.
I have pushed a fix, but it's very specific. In Reduce
with mode in ["max", "min", "mean"]
, it checks if the input comes from FullStatic
and sets is_size_value
for the tensor entry if the fill_value
is a SizeValue
.
The more relevant question however is: How will it look in practice, i.e. where do the lengths come from, and how are they connected to inputs?
I don't remember the details, but I think the motivation for the test was test_packed_sequence_2
or, more generally, torch.nn.utils.rnn.pack_sequence
.
I have pushed a fix, but it's very specific. In
Reduce
withmode in ["max", "min", "mean"]
, it checks if the input comes fromFullStatic
and setsis_size_value
for the tensor entry if thefill_value
is aSizeValue
.
But it only makes sense for max
, or not?
The more relevant question however is: How will it look in practice, i.e. where do the lengths come from, and how are they connected to inputs?
I don't remember the details, but I think the motivation for the test was
test_packed_sequence_2
or, more generally,torch.nn.utils.rnn.pack_sequence
.
But also there, the same question: Where do the lengths
come from?
I have pushed a fix, but it's very specific. In
Reduce
withmode in ["max", "min", "mean"]
, it checks if the input comes fromFullStatic
and setsis_size_value
for the tensor entry if thefill_value
is aSizeValue
.But it only makes sense for
max
, or not?
If it is FullStatic
, then all values in the tensor are the same, so max = min = mean, ot not?
The more relevant question however is: How will it look in practice, i.e. where do the lengths come from, and how are they connected to inputs?
I don't remember the details, but I think the motivation for the test was
test_packed_sequence_2
or, more generally,torch.nn.utils.rnn.pack_sequence
.But also there, the same question: Where do the
lengths
come from?
In pack_sequence
we have
batch_dim = sequences.shape[torch_axis_from_returnn_axis[tensor_entry.returnn_data.batch_dim_axis]]
time_dim = sequences.shape[torch_axis_from_returnn_axis[tensor_entry.returnn_data.time_dim_axis]]
lengths = full((batch_dim,), time_dim)
# batch_first is always True because we assume batch-time-major tensor, see above
return pack_padded_sequence(sequences, lengths, enforce_sorted=enforce_sorted, batch_first=True)
so lengths
is created by full
. Then in pack_padded_sequence
, the lengths
are converted to batch_sizes
via _batch_sizes_from_lengths
which uses torch.arange(torch.max(lengths))
. So this is pretty much the same as what we test in test_arange_from_lengths
.
I have pushed a fix, but it's very specific. In
Reduce
withmode in ["max", "min", "mean"]
, it checks if the input comes fromFullStatic
and setsis_size_value
for the tensor entry if thefill_value
is aSizeValue
.But it only makes sense for
max
, or not?If it is
FullStatic
, then all values in the tensor are the same, so max = min = mean, ot not?
I don't understand why this is relevant? Normally you would never use FullStatic
in practice for the seq lengths. The seq lengths are usually never all the same. So in the general case, only max
is correct.
In
pack_sequence
we havebatch_dim = sequences.shape[torch_axis_from_returnn_axis[tensor_entry.returnn_data.batch_dim_axis]] time_dim = sequences.shape[torch_axis_from_returnn_axis[tensor_entry.returnn_data.time_dim_axis]] lengths = full((batch_dim,), time_dim) # batch_first is always True because we assume batch-time-major tensor, see above return pack_padded_sequence(sequences, lengths, enforce_sorted=enforce_sorted, batch_first=True)
so
lengths
is created byfull
. Then inpack_padded_sequence
, thelengths
are converted tobatch_sizes
via_batch_sizes_from_lengths
which usestorch.arange(torch.max(lengths))
. So this is pretty much the same as what we test intest_arange_from_lengths
.
But why is this relevant? This code is obviously wrong, and you would use it in practice. You do not have the same seq lengths in practice.
Maybe you could just also set
is_size_value
oflengths
here in this example (and in general, however you would getlengths
). This is not just a shape dim (scalar) but the actual lengths but we could treat it in the same way. That would then be easy to detect intorch.max
. Not sure if this is a good solution or whether this causes other problems elsewhere.
Did you try this?
I.e. this would be a small extra logic in FullStatic
.
And in Reduce
, you would just check for is_size_value
in case of max
(and only max
).
Maybe you could just also set
is_size_value
oflengths
here in this example (and in general, however you would getlengths
). This is not just a shape dim (scalar) but the actual lengths but we could treat it in the same way. That would then be easy to detect intorch.max
. Not sure if this is a good solution or whether this causes other problems elsewhere.Did you try this?
I.e. this would be a small extra logic in
FullStatic
.And in
Reduce
, you would just check foris_size_value
in case ofmax
(and onlymax
).
I pushed something, is that what you mean?
I pushed something, is that what you mean?
Yes, looks good.
Fix #114