🔭sightseq
Now, Let's go sightseeing by vision and sequence language multimodal around the deep learning world.
What's New:
- July 30, 2019: Add faster rcnn models. And I rename this repo from image-captioning to sightseq, this is the last time I rename this repo, I promise.
- June 11, 2019: I rewrite the text recognition part base on fairseq. Stable version refer to branch crnn, which provides pre-trained model checkpoints. Current branch is work in process. Very pleasure for suggestion and cooperation in the fairseq text recognition project.
Features:
sightseq provides reference implementations of various deep learning tasks, including:
- Text Recognition
- Object Detection
Additionally:
- All features of fairseq
- Flexible to enable convolution layer, recurrent layer in CRNN
- Positional Encoding of images
General Requirements and Installation
- PyTorch (There is a bug in nn.CTCLoss which is solved in nightly version)
- Python version >= 3.5
- Fairseq version >= 0.7.1
- torchvision version >= 0.3.0
- For training new models, you'll also need an NVIDIA GPU and NCCL
Pre-trained models and examples
License
sightseq is MIT-licensed.
The license applies to the pre-trained models as well.