Output dim tag of torch.full

vieting commented 2 years ago

Fix #114

albertz commented 2 years ago

I don't remember the details anymore. limit.is_defined makes only sense for static dims, right? And if we have limit.returnn_naming_entry.is_size_value, we don't need that (we don't need limit = limit.numpy()).

albertz commented 2 years ago

I'm not sure if we have limit.returnn_naming_entry.is_size_value here though. Effectively, limit is also not the size itself but the max of the seq lengths. Which should be the same though. We probably need some extra logic to set this correctly then. E.g. when you do torch.max(lengths).

vieting commented 2 years ago

I don't remember the details anymore. limit.is_defined makes only sense for static dims, right? And if we have limit.returnn_naming_entry.is_size_value, we don't need that (we don't need limit = limit.numpy()).

Yes, I agree.

I'm not sure if we have limit.returnn_naming_entry.is_size_value here though. Effectively, limit is also not the size itself but the max of the seq lengths. Which should be the same though. We probably need some extra logic to set this correctly then. E.g. when you do torch.max(lengths).

Right, it's None here. I'll look into the torch.max.

vieting commented 2 years ago

I'm not sure if we have limit.returnn_naming_entry.is_size_value here though. Effectively, limit is also not the size itself but the max of the seq lengths. Which should be the same though. We probably need some extra logic to set this correctly then. E.g. when you do torch.max(lengths).

So for Reduce with mode="max", we would need to check if we can already know what that is?

albertz commented 2 years ago

I'm not sure if we have limit.returnn_naming_entry.is_size_value here though. Effectively, limit is also not the size itself but the max of the seq lengths. Which should be the same though. We probably need some extra logic to set this correctly then. E.g. when you do torch.max(lengths).

So for Reduce with mode="max", we would need to check if we can already know what that is?

What fails now with your change is e.g. test_arange_from_lengths. In there, we have this code:

batch_dim, time_dim = inputs.shape
lengths = torch.full((batch_dim,), time_dim)
arange = torch.arange(torch.max(lengths))

In practice, it is a question where lengths comes from. I think so far we never really attached the sequence lengths to the original tensor in any way, or did we?

In this synthetic unrealistic example, we could infer that torch.max(lengths) refers to time_dim which has the correct is_size_value. Is the same actually, we maybe can even return the same tensor object.

We should fix this example such that the tests pass. Which probably should not be too difficult. Or otherwise the tests somehow needs to be rewritten.

The more relevant question however is: How will it look in practice, i.e. where do the lengths come from, and how are they connected to inputs?

Maybe you could just also set is_size_value of lengths here in this example (and in general, however you would get lengths). This is not just a shape dim (scalar) but the actual lengths but we could treat it in the same way. That would then be easy to detect in torch.max. Not sure if this is a good solution or whether this causes other problems elsewhere.

vieting commented 2 years ago

I have pushed a fix, but it's very specific. In Reduce with mode in ["max", "min", "mean"], it checks if the input comes from FullStatic and sets is_size_value for the tensor entry if the fill_value is a SizeValue.

vieting commented 2 years ago

The more relevant question however is: How will it look in practice, i.e. where do the lengths come from, and how are they connected to inputs?

I don't remember the details, but I think the motivation for the test was test_packed_sequence_2 or, more generally, torch.nn.utils.rnn.pack_sequence.

albertz commented 2 years ago

I have pushed a fix, but it's very specific. In Reduce with mode in ["max", "min", "mean"], it checks if the input comes from FullStatic and sets is_size_value for the tensor entry if the fill_value is a SizeValue.

But it only makes sense for max, or not?

albertz commented 2 years ago

The more relevant question however is: How will it look in practice, i.e. where do the lengths come from, and how are they connected to inputs?

I don't remember the details, but I think the motivation for the test was test_packed_sequence_2 or, more generally, torch.nn.utils.rnn.pack_sequence.

But also there, the same question: Where do the lengths come from?

vieting commented 2 years ago

I have pushed a fix, but it's very specific. In Reduce with mode in ["max", "min", "mean"], it checks if the input comes from FullStatic and sets is_size_value for the tensor entry if the fill_value is a SizeValue.

But it only makes sense for max, or not?

If it is FullStatic, then all values in the tensor are the same, so max = min = mean, ot not?

vieting commented 2 years ago

The more relevant question however is: How will it look in practice, i.e. where do the lengths come from, and how are they connected to inputs?

I don't remember the details, but I think the motivation for the test was test_packed_sequence_2 or, more generally, torch.nn.utils.rnn.pack_sequence.

But also there, the same question: Where do the lengths come from?

In pack_sequence we have

    batch_dim = sequences.shape[torch_axis_from_returnn_axis[tensor_entry.returnn_data.batch_dim_axis]]
    time_dim = sequences.shape[torch_axis_from_returnn_axis[tensor_entry.returnn_data.time_dim_axis]]
    lengths = full((batch_dim,), time_dim)
    # batch_first is always True because we assume batch-time-major tensor, see above
    return pack_padded_sequence(sequences, lengths, enforce_sorted=enforce_sorted, batch_first=True)

so lengths is created by full. Then in pack_padded_sequence, the lengths are converted to batch_sizes via _batch_sizes_from_lengths which uses torch.arange(torch.max(lengths)). So this is pretty much the same as what we test in test_arange_from_lengths.

albertz commented 2 years ago

I have pushed a fix, but it's very specific. In Reduce with mode in ["max", "min", "mean"], it checks if the input comes from FullStatic and sets is_size_value for the tensor entry if the fill_value is a SizeValue.

But it only makes sense for max, or not?

If it is FullStatic, then all values in the tensor are the same, so max = min = mean, ot not?

I don't understand why this is relevant? Normally you would never use FullStatic in practice for the seq lengths. The seq lengths are usually never all the same. So in the general case, only max is correct.

albertz commented 2 years ago

In pack_sequence we have
    batch_dim = sequences.shape[torch_axis_from_returnn_axis[tensor_entry.returnn_data.batch_dim_axis]]
    time_dim = sequences.shape[torch_axis_from_returnn_axis[tensor_entry.returnn_data.time_dim_axis]]
    lengths = full((batch_dim,), time_dim)
    # batch_first is always True because we assume batch-time-major tensor, see above
    return pack_padded_sequence(sequences, lengths, enforce_sorted=enforce_sorted, batch_first=True)
so lengths is created by full. Then in pack_padded_sequence, the lengths are converted to batch_sizes via _batch_sizes_from_lengths which uses torch.arange(torch.max(lengths)). So this is pretty much the same as what we test in test_arange_from_lengths.

But why is this relevant? This code is obviously wrong, and you would use it in practice. You do not have the same seq lengths in practice.

albertz commented 2 years ago

Maybe you could just also set is_size_value of lengths here in this example (and in general, however you would get lengths). This is not just a shape dim (scalar) but the actual lengths but we could treat it in the same way. That would then be easy to detect in torch.max. Not sure if this is a good solution or whether this causes other problems elsewhere.

Did you try this?

I.e. this would be a small extra logic in FullStatic.

And in Reduce, you would just check for is_size_value in case of max (and only max).

vieting commented 2 years ago

Maybe you could just also set is_size_value of lengths here in this example (and in general, however you would get lengths). This is not just a shape dim (scalar) but the actual lengths but we could treat it in the same way. That would then be easy to detect in torch.max. Not sure if this is a good solution or whether this causes other problems elsewhere.

Did you try this?

I.e. this would be a small extra logic in FullStatic.

And in Reduce, you would just check for is_size_value in case of max (and only max).

I pushed something, is that what you mean?

albertz commented 2 years ago

I pushed something, is that what you mean?

Yes, looks good.

rwth-i6 / pytorch-to-returnn

Output dim tag of torch.full #115