zhiqwang / sightseq

Computer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection

MIT License

125 stars 34 forks source link

attention crnn ctc densenet faster-rcnn image-captioning mobilenet object-detection ocr pytorch scene-texts text-recognition transformer

readme

🔭sightseq

Now, Let's go sightseeing by vision and sequence language multimodal around the deep learning world.

What's New:

July 30, 2019: Add faster rcnn models. And I rename this repo from image-captioning to sightseq, this is the last time I rename this repo, I promise.
June 11, 2019: I rewrite the text recognition part base on fairseq. Stable version refer to branch crnn, which provides pre-trained model checkpoints. Current branch is work in process. Very pleasure for suggestion and cooperation in the fairseq text recognition project.

Features:

sightseq provides reference implementations of various deep learning tasks, including:

Text Recognition
- Shi et al. (2015), CRNN: An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
Object Detection
- New Ren et al. (2015), Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Additionally:

All features of fairseq
Flexible to enable convolution layer, recurrent layer in CRNN
Positional Encoding of images

General Requirements and Installation

PyTorch (There is a bug in nn.CTCLoss which is solved in nightly version)
Python version >= 3.5
Fairseq version >= 0.7.1
torchvision version >= 0.3.0
For training new models, you'll also need an NVIDIA GPU and NCCL

Pre-trained models and examples

License

sightseq is MIT-licensed. The license applies to the pre-trained models as well.