ruotianluo / ImageCaptioning.pytorch

I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)
MIT License
1.45k stars 416 forks source link

How to extract feature ATT features from one's own dataset #183

Open Huanyu2019 opened 1 year ago

Huanyu2019 commented 1 year ago

Professor Luo, I am a beginner and I want to reproduce these models on my own dataset. My dataset is a simple image captioning dataset, and I want to extract attention (att) features for further training. How can I implement feature extraction? Should I directly use the features extracted by Faster R-CNN or do I need to retrain it on my own dataset (even though my dataset is not an object detection dataset)

ruotianluo commented 1 year ago

Hi, first of all, if you want to extract features, the dataset does not have to be an object detection dataset. Second, if you would like to train your model from scratch, you can use any feature extraction. My suggestion may be CLIP at this point. You just need to convert the features into similar format as used in the codebase, and then you can run the training. Thirdly, if you would like to use the pretrained models I provide, I suggest you use the 12-in-1 one (see data/README.md) because it uses pytorch to extract features. The bottom up uses caffe and I don't know if it is still easy to run that.