yingkaisha / keras-vision-transformer

The Tensorflow, Keras implementation of Swin-Transformer and Swin-UNET
MIT License
116 stars 40 forks source link
keras swinunet tensorflow transformer vision-transformer

keras-vision-transformer

This repository contains the tensorflow.keras implementation of the Swin Transformer (Liu et al., 2021) and its applications to benchmark datasets.

Notebooks

Note: the Swin-UNET implementation is experimental

Dependencies

Overview

Swin Transformers are Transformer-based computer vision models that feature self-attention with shift-windows. Compared to other vision transformer variants, which compute embedded patches (tokens) globally, the Swin Transformer computes token subsets through non-overlapping windows that are alternatively shifted within Transformer blocks. This mechanism makes Swin Transformers more suitable for processing high-resolution images. Swin Transformers have shown effectiveness in image classification, object detection, and semantic segmentation problems.

Contact

Yingkai (Kyle) Sha <yingkai@eoas.ubc.ca> <yingkaisha@gmail.com>

The work is benefited from:

License

MIT License