Open Huanyu2019 opened 1 year ago
Hi, first of all, if you want to extract features, the dataset does not have to be an object detection dataset. Second, if you would like to train your model from scratch, you can use any feature extraction. My suggestion may be CLIP at this point. You just need to convert the features into similar format as used in the codebase, and then you can run the training. Thirdly, if you would like to use the pretrained models I provide, I suggest you use the 12-in-1 one (see data/README.md) because it uses pytorch to extract features. The bottom up uses caffe and I don't know if it is still easy to run that.
Professor Luo, I am a beginner and I want to reproduce these models on my own dataset. My dataset is a simple image captioning dataset, and I want to extract attention (att) features for further training. How can I implement feature extraction? Should I directly use the features extracted by Faster R-CNN or do I need to retrain it on my own dataset (even though my dataset is not an object detection dataset)