Open MatthiasRMS opened 7 years ago
By defenition, object detection needs bounding boxes (or segmentation or something like that) if you don't want to know where in the image the object is, it is an image classification task, and you can use just VGG16 or ResNet. I think object detection should still perform better for you, as you want to detect multiple objects in one image. So it might be worth the effort creating some annotations.
Thanks for your answer. I do have annotations (classes present in each image), but not the boxes. You're right it's more a multi label classification task but I assumed it'd perform better with the localisation.
Can I do multiple label classification with VGG16 ?
Thanks a lot for your help
Well I do think it will do better with the localisation, and possibly you could addapt this to work without the bounding boxes. But I think the RPN will prune any boxes that have too little overlap with the ground truth box, so you need to bypass that somehow.
You should be able to do multiclass classification with any network by replacing the final softmaxes with sigmoid, and using binary_crossentropy as the loss.
Hi !
I'm working on retraining this model on a new dataset to extract attributes from clothes (e.g sleeves type, neck type etc). I'm only looking to the attribute prediction, not the area prediction.
I have a set with images and for each image the attributes, but I don't have the bound boxes of each attribute on the image as it is in the PASCAL VOC dataset.
Do I need the bound boxes to retrain it ?