ruotianluo / ImageCaptioning.pytorch

I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)
MIT License
1.43k stars 409 forks source link

How to extract feature ATT features from one's own dataset #183

Open Huanyu2019 opened 11 months ago

Huanyu2019 commented 11 months ago

Professor Luo, I am a beginner and I want to reproduce these models on my own dataset. My dataset is a simple image captioning dataset, and I want to extract attention (att) features for further training. How can I implement feature extraction? Should I directly use the features extracted by Faster R-CNN or do I need to retrain it on my own dataset (even though my dataset is not an object detection dataset)

ruotianluo commented 11 months ago

Hi, first of all, if you want to extract features, the dataset does not have to be an object detection dataset. Second, if you would like to train your model from scratch, you can use any feature extraction. My suggestion may be CLIP at this point. You just need to convert the features into similar format as used in the codebase, and then you can run the training. Thirdly, if you would like to use the pretrained models I provide, I suggest you use the 12-in-1 one (see data/README.md) because it uses pytorch to extract features. The bottom up uses caffe and I don't know if it is still easy to run that.