MobileViT model loading issues

veb-101 commented 5 months ago

There's an issue loading MobileViT with image sizes whose output_stride is not divisible by 2. The model works well with input_shape (256, 256, 3) but not with (None, None, 3) or (224, 224, 3).

ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concatenation axis. Received: input_shape=[(None, 7, 7, 80), (None, 8, 8, 80)]

I faced the same issue. It seems like the resize method used in the original repo is really needed.

For resolving the (None, None, 3) inputs I had to write overwrite the compute_output_shape(...) method.

However, this led to another issue where the layer parameters weren't being initialized. To resolve this, I had to pass a dummy data (like this) before returning the model.

This is not a dig at your repository. I am sharing that I faced the issues and the solutions to fix them since you and I are working toward the same goal.

james77777778 commented 5 months ago

Hi @veb-101

Thanks for reporting this issue. I believe it is related to some transformer-based models and might be solvable by interpolating the feature map or the weights.

I will look into this and report back.

james77777778 commented 5 months ago

I have fixed it and pushed a new release for this. You can refer to https://github.com/james77777778/keras-image-models/releases/tag/0.2.1 and install it by pip install -U kimm

The details:

MobileViT*: Add keras.ops.resize to the feature map
VisionTrasformer*: Add mechanism to interpolate the weights of PosisionEmbedding during loading

I'm closing this issue. Feel free to reopen it if it is not resolved.

james77777778 / keras-image-models

MobileViT model loading issues #45