keras-team / keras-hub

Pretrained model hub for Keras 3
Apache License 2.0
804 stars 243 forks source link

[Semantic Segmentation] - Add SegFormer Architecture, Weight Conversion Script and Presets #1883

Closed DavidLandup0 closed 1 month ago

DavidLandup0 commented 1 month ago

This PR adds:

Basic Usage

preprocessor = keras_hub.models.ImageSegmenterPreprocessor.from_preset("segformer_b0_ade20k_512")
segmenter = keras_hub.models.SegFormerImageSegmenter.from_preset("segformer_b0_ade20k_512")
segmenter(np.random.rand(1, 512, 512, 3))

End-to-end example with preprocessor:

import urllib.request 
from PIL import Image 
import numpy as np
import keras_hub

preprocessor = keras_hub.models.ImageSegmenterPreprocessor.from_preset("segformer_b0_ade20k_512")
segmenter = keras_hub.models.SegFormerImageSegmenter.from_preset("segformer_b0_ade20k_512")

img_url = "https://www.vanorohotel.com/wp-content/uploads/2021/07/drz-vanoro_6737.jpg"  
urllib.request.urlretrieve(img_url, "image.png") 

img = np.array(Image.open("image.png").resize((512, 512)))
img = np.expand_dims(img, 0)
inputs = preprocessor(img)
outs = segmenter(inputs)

image

With Image Converter

converter = keras_hub.layers.ImageConverter(image_size=(512, 512))
preprocessor = keras_hub.models.ImageSegmenterPreprocessor.from_preset("segformer_b0_ade20k_512", image_converter=converter)
segmenter = keras_hub.models.SegFormerImageSegmenter.from_preset("segformer_b0_ade20k_512")

Training Pipeline Example

A few examples in the notebook below:

https://colab.research.google.com/drive/1EBNg6nPKx_KzyRuQQtHZ_PG_Nsf2pAg2#scrollTo=V9Ub4NHKCx9e

After a few minutes of training from scratch (both encoder and segmenter):

image image image image

divyashreepathihalli commented 1 month ago

Is this PR ready for review?

DavidLandup0 commented 1 month ago

@divyashreepathihalli In essence - yes, but there's some noise when running predictions using the converted weights (example image below of huggingface outputs vs our outputs). Looking into the numerics again, but the rest of the PR is ready for review :)

image

DavidLandup0 commented 1 month ago

Found the issue - a transpose call shuffling the order of a latent in the encoder incorrectly. I'll get the presets up on Kaggle now

image