Closed Rocketknight1 closed 1 year ago
Really interesting, is this layer required to get strong performing in vision transformer models?
I see no issue in including it, but I want to be sure it fits a use case. Let me know if you have more information as to "why" it is used.
I suspect that the layer is never really necessary when you're designing a model from scratch - you could always just use normal pooling layers and just choose the strides and widths appropriately. The need for it in TF mostly arises when someone has trained a model in PyTorch and you want to reimplement their model and load their weights - then if you write a slightly different pooling layer you'll probably break compatibility.
It's commonly used in pyramid pooling modules (paper, >8000 citations), e.g. in BeiT (>200 citations, code sample) and Data2Vec. There is also a tensorflow-addons port of that module, but as it depends on the TFA implementation of adaptive pooling, results do not match Torch pyramid pooling modules.
That said, it's totally okay if you want to leave it out for now - I linked it in the gist above, so feel free to just close this for now and add the layer later if/when you find out you need it for a model.
That said, it's totally okay if you want to leave it out for now - I linked it in the gist above, so feel free to just close this for now and add the layer later if/when you find out you need it for a model.
I think this sounds like a good addition, I'm mainly just curious if the only reason anyone uses it is backwards compatibility. Do people continue to use it a lot in the pytorch world today?
Do I understand correctly that this layer at train/inference always uses the same size strides/width, but that this layer is a way of auto-selecting these values?
Yes, that's correct. However, it doesn't exactly have a 'stride' in the usual sense. It basically splats (potentially overlapping) pooling windows all across the input so as to get the desired output shape, but the spacing between these windows is usually not constant, unless the input size is an integer multiple of the output size. The width of the windows can also vary in different locations.
I see, can you paste a snipped of the API we would want to use? Do you specify output shape?
Sure, the API is just:
layer = AdaptiveAvgPool2D(output_dims=(128, 128))
# Can also support NCHW, but we use NHWC here
inputs = tf.ones((8, 192, 192, 3), dtype=tf.float32)
outputs = layer(inputs)
# outputs.shape is (8, 128, 128, 3)
In other words, you specify the output shape at init, and then whatever Tensor
you pass in gets pooled down to the desired output shape, with the required pooling windows being calculated in the call()
. The same layer can handle multiple different input shapes without needing to be rebuilt.
Gotcha, that is a pretty interesting feature.
One last quick Q, does it handle upscaling too?
layer = AdaptiveAvgPool2D(output_dims=(128, 128))
# Can also support NCHW, but we use NHWC here
inputs = tf.ones((8, 192, 192, 3), dtype=tf.float32)
outputs = layer(inputs)
# outputs.shape is (8, 128, 128, 3)
upscaled = AdaptiveAvgPool2D(output_dims=(256, 256))(outputs)
# upscaled is (8, 256, 256, 3)
In Torch it might (though I don't think this is a common/intended use case), but because I implemented it using normal pooling layers in TF I don't think it would work in my implementation. If needed, I could add a different code path for upscaling, but I haven't seen any code in the wild where people use it for that.
for sure, thanks.
My only real concern here is that this isn't idiomatic to Keras. In pytorch you specify output features all the time, but in Keras that is computed for you. So it is a little strange to have this in Keras, BUT I will say for compatibility purposes it could be valuable.
Yeah, I'd say it's mostly (entirely?) useful for PyTorch model compatibility, so I get that it might feel out of place. But still, let us know if you want it anyway, and we'll make a PR!
thanks for the offer!
I'd like to hear @fchollet and @tanzhenyu 's opinion on compatibility layers like this. Our long term goal is to not port weights, but perhaps this is low enough cost that the benefit outweights the cost.
@Rocketknight1 Do you also implement 1D and 3D version of this layer?
Hi @innat I could, yes! If you look at the gist I linked above, the pseudo_1d_pool
function is basically just a 1D AdaptivePool
, so that would be very easy to implement as a separate layer. To do a 3D pool I would just do a 1D pool on each of the 3 dimensions.
Closing this until we have a strong use case!
AdaptiveAveragePoolingxD (1D, 2D, 3D) used to exist in "TensorFlow Addons." However, with TensorFlow Addons no longer being supported, it would be beneficial to include it in KerasCV. For a strong use case, I can refer you to its application in Tri-Plane representation, which is being increasingly used in 3D Generative AI literature. One example is the recent work by Wu, Learning to Generate 3D Shapes from a Single Example.
Short Description At Hugging Face we've seen a few PyTorch vision transformer models using
AdaptiveAvgPool2D
. In a lot of cases these are just resizing to(1,)
or(1, 1)
, in which case they're just a strange way to computetorch.mean()
, but in some cases they're actually using this layer's full functionality.The problem with
AdaptiveAvgPool2D
is that it computes the pooling windows in a unique way, and the size of the windows can be variable. This makes it impossible to implement with a standard pooling layer, and very annoying to port to TF, especially if you want to load weights from a model that was trained with the Torch layer. There is an implementation in tensorflow-addons but it uses fixed size windows, and so does not match the output of the Torch layer unless the input size is an integer multiple of the output size.We made a reasonably performant TF version that does correctly match the Torch layer in all cases - do you need this in
keras-cv
? We'd be happy to add it as a PR if it's useful.Existing Implementations Torch layer
Other information See this StackOverflow post for a good description of how the Torch layer works internally.