keras-team / keras-hub

Pretrained model hub for Keras 3
Apache License 2.0
803 stars 243 forks source link

Add `FeaturePyramidBackbone` and port weights from `timm` for `ResNetBackbone` #1769

Closed james77777778 closed 3 months ago

james77777778 commented 3 months ago

This PR introduces FeaturePyramidBackbone, a wrapper for Backbone that adds pyramid_outputs property. If a vision backbone supports feature pyramids, it should subclass FeaturePyramidBackbone.

I modified ResNetBackbone by subclassing FeaturePyramidBackbone to include feature pyramid information.

Also, head_dtype is added in ResNetImageClassifier.

@divyashreepathihalli @mattdangerw @SamanehSaadat

EDITED: See https://github.com/keras-team/keras-nlp/pull/1769#issuecomment-2287967827 for updates

mattdangerw commented 3 months ago

@james77777778 I was thinking we would make FeaturePyramidBackbone a simple subclass of Backbone for most CV backbones.

So...

The goal is to keep the Backbone base class as clean as we can, now that we are venturing into multi modal models with a lot of different overall patterns. dir(bert_backbone) shouldn't have feature pyramid stuff in it. If we could move the token_embedding off the backbone and into a TextBackbone or similar without breaking compat we probably would too.

Most CV models like ResNet, DenseNet, EfficientNet, etc, will be FeaturePyramidBackbones. But something like ViT can just subclass Backbone directly without needing the feature pyramid outputs.

WDYT?

james77777778 commented 3 months ago

@mattdangerw @divyashreepathihalli Got it. That makes sense. Updated!

I have updated the PR to make compatible with timm. Additionally, the conversion logic has been added, similar to how it’s done in transformers.

Please refer to this colab for the numerical check: https://colab.research.google.com/drive/1QnmNDiFYd56fsYoaUM46QRT4gF9G06fH?usp=sharing

Supported:

divyashreepathihalli commented 3 months ago

Looks really good! Its awesome to have built in timm conversion! LGTM!

mattdangerw commented 3 months ago

Thanks!