This project use models and weights from self-critical.pytorch. Bottom-up attention embeddings generated using py-bottom-up-attention, which is pytorch implementation of bottom-up-attention.
You should install detectron with
python3 setup.py build develop
Also you should download weights for ResNet, Bottom-up attention models.
Or you can install and download models using download.sh
script.
Repo installation code, which I use:
git clone https://github.com/grazder/Image-Captioning-Inference.git
cd Image-Captioning-Inference
pip install -r requirements.txt
bash download.sh
You don't need to download all models only models which you will use. For example: bottom-up attention + transformer. Everything else you can comment.
There are a lot of models from self-critical.pytorch. Which you can find in MODEL_ZOO.
from Captions import Captions
import os
model_fc_resnet = Captions(
model_path='data/fc-resnet-weights/model.pth',
infos_path='data/fc-resnet-weights/infos.pkl',
model_type='resnet',
resnet_model_path='data/imagenet_weights/resnet101.pth',
bottom_up_model_path='data/bottom-up/faster_rcnn_from_caffe.pkl',
bottom_up_config_path='data/bottom-up/faster_rcnn_R_101_C4_caffe.yaml',
bottom_up_vocab='data/vocab/objects_vocab.txt',
device='cpu'
)
images = os.listdir('example_images/')
paths = [os.path.join('example_images', x) for x in images]
preds = model_fc_resnet.get_prediction(paths)
for i, pred in enumerate(preds):
print(f'{images[i]}: {pred}')
I took scores and models from MODEL_ZOO. Time estimated in google colab.
Collection: link
Name | CIDEr | SPICE | Download | Time @ 1 image. |
---|---|---|---|---|
FC | 0.953 | 0.1787 | model&metrics | 4.1 s |
FC +self_critical |
1.045 | 0.1838 | model&metrics | 4.2 s |
FC +new_self_critical |
1.053 | 0.1857 | model&metrics | 4.7 s |
Collection: link
Name | CIDEr | SPICE | Download | Time @ 1 image. |
---|---|---|---|---|
Att2in | 1.089 | 0.1982 | model&metrics | 19.5s |
Att2in +self_critical |
1.173 | 0.2046 | model&metrics | 19.7s |
Att2in +new_self_critical |
1.195 | 0.2066 | model&metrics | 19.7s |
UpDown | 1.099 | 0.1999 | model&metrics | 20.1s |
UpDown +self_critical |
1.227 | 0.2145 | model&metrics | 19.8s |
UpDown +new_self_critical |
1.239 | 0.2154 | model&metrics | 19.9s |
UpDown +Schedule long +new_self_critical |
1.280 | 0.2200 | model&metrics | 20s |
Transformer | 1.1259 | 0.2063 | model&metrics | 20.3s |
Transformer(warmup+step decay) | 1.1496 | 0.2093 | model&metrics | 20.2s |
Transformer +self_critical |
1.277 | 0.2249 | model&metrics | 20.4s |
Transformer +new_self_critical |
1.303 | 0.2289 | model&metrics | 20.2s |
Name | street.jpg | man.jpeg | statue.jpeg | tv_man.jpeg |
---|---|---|---|---|
FC | a group of people walking down a street | a man in a suit and tie holding a cell phone | a man in a hat and a hat holding a frisbee | a man is brushing his teeth with a tooth brush |
FC + self-critical | a group of people riding a bike down a street | a man wearing a suit and a tie | a man standing next to a man with a baseball bat | a man taking a picture in a bathroom with a mirror |
FC + new-self-critical | a group of people riding bikes down a city street | a man wearing a suit and tie talking on a cell phone | a man is holding a frisbee in a street | a man brushing his teeth in a bathroom with a mirror |
Att2in | a group of people riding bikes down a city street | a man in a suit and tie is wearing a suit | a man and a woman are standing in a park | a man in a blue shirt playing a video game |
Att2in + self-critical | a group of people riding a bike down a city street | a man wearing a suit and tie and a table | a man and a woman sitting on a bench with a book | a man playing a video game in a wii |
Att2in + new self-critical | a group of people riding bikes down a city street | a man in a suit and tie standing in front of a table | a man and a woman sitting on a bench with a book | a man is playing a video game with a wii |
Updown | a group of people riding bikes down a street | a man in a suit and tie is holding a microphone | a man and a woman are standing in front of a tree | a man is playing a video game on a television |
Updown + self-critical | a group of people riding bikes down a city street | a man in a suit and tie sitting on a table | a man and a woman sitting on a bench with a book | a man is holding a video game on a television |
Updown + new self-critical | a group of people riding bikes down a city street | a man in a suit and tie in a UNK | a man and a woman holding a book | a man is playing a video game on a tv |
UpDown+Schedule long+new_self_critical | a group of people riding on a city street | a man in a suit and tie sitting in a table | a man and a woman standing in front of a tree | a man playing a video game with a wii |
Transformer | a group of people are riding bikes on the sidewalk | a man in a suit and tie sitting in a chair | a man and woman standing in front of a statue | a man in a room playing a video game |
Transformer(warmup+step decay) | a group of people riding bikes down a city street | a man in a suit sitting in a chair | a man and woman standing next to each other | a man is playing a video game on a large screen |
Transformer + self-critical | a group of people riding bikes down a city street | a man in a suit and tie sitting in a room | a man and a woman standing in front of a tree | a man playing a video game in a room |
Transformer + new self-critical | a group of people riding bikes down a city street | a man in a suit and tie sitting in a room | a man and a woman standing next to a tree | a man sitting in front of a television |