Closed hqqasw closed 5 years ago
Hi, The backbone is given here and you can download it from the drive. The backbone is only trained on the 64 seen classes, but for the unseen classes we also train a model which is called in the drive files "paperDiscriminator", it learns on the 16 unseen classes using classification loss, we use this classification head in order to test the classification accuracy of the LaSO vectors on unseen classes. The 79% percent on the unseen classes is the classification accuracy of the paperDiscriminator on the original feature vectors from unseen classes, which isn't a hard task as he trained in a fully supervised way over them. Also notice that our batch creation is different from normal classifiers as we create batches of random pairs. I believe that the state of the art in fully supervised multi label classification on COCO is around 77 percent, but our batch creation is different than theirs so the numbers can't be compared (btw the SOTA results exceeds our backbone's results for fully supervised classification ;) I checked haha).
I hope my answer clarifies this to you. Amit
Your answer helps a lot. Thanks very much!
Many thanks for your great job! I have a question about the base model (feature extractor backbone). As mentioned in the paper, the base model is "pre-trained from scratch" using the training set of COCO. Does it mean that the weights of the model are random initialized (without pre-trained in ImageNet)? It is a little surprising that the base model gets 75% mAP for the seen classes. And it is more surprising that it gets 79% for the unseen classes (even better than the seen classes) if it is just trained on the seen classes of COCO. Hope for your reply.