Open greatwallet opened 2 years ago
Hey,
I just retrained the model on the CUB
dataset with a batch size of 3 instead of 6 and I got the following results FYI
FG-NMI1 | FG-ARI1 | Full-NMI | Full-ARI | |
---|---|---|---|---|
paper | 46.0 | 21.0 | 43.5 | 19.6 |
loading provided weights | 46.03 | 21.03 | 43.52 | 19.58 |
own training run (batch size 3 instead of 6) | 44.10 | 19.90 | 41.28 | 18.70 |
this github issue | 39.62 | 17.72 | 39.12 | 18.11 |
Hello, thank you for your inspiring work! I tried to re-run the code on all of the datasets, but the results were not as promising as those repored by paper.
For
CUB
andDeepFashion
, I did not modify any of the codes, but the metrics onCUB
were poor.And for
Pascal-Part
, I modified the training hyper-params according to your supp file. And also for the fairness of evaluation, I trained a foreground segmentator (aDeepLabV2-ResNet50-2branch
) and used the predicted mask at evaluation. I conducted training uponCar
,Cat
andHorse
. The results onHorse
were OK, but those onCat
andCar
were really not good.Could you please provide analysis upon why the re-run results on
CUB
were not good? Also, forPascal-Part
, are there any training details that is left out in papers, so that I did not reproduce the results? (BTW, could you provide supervised mask ofPascalPart
?)