Open iseong83 opened 2 years ago
For the record I'm seeing exactly the same problem -- I can replicate the STEGO results against the model already trained, but when I train myself I get a lower accuracy for the cluster probe than the paper reports.
I attempted to train cocostuff to get a successful training run to see what the graphs looked like (https://github.com/mhamilton723/STEGO/issues/23#issuecomment-1186319622). Even with this, though, I could not successfully tune the Potsdam hyperparameters.
I decided to turn to a Bayesian hyperparameter optimizer, SigOpt. I had it run for about 100 times, tuning the various positive and negative hyperparameters, focused on just optimizing cluster mIoU. Technically I should have had it optimize linear accuracy/mIoU and cluster accuracy/mIoU all together, but for simplicity just chose cluster mIoU. It came up with these hyperparameter values for the Potsdam dataset:
Parameters: neg_inter_shift: 0.9981259810906995 neg_inter_weight: 0.19914806514497108 pos_inter_shift: 0.17863135533504992 pos_inter_weight: 0.6098772723430869 pos_intra_shift: 0.003232418118101617 pos_intra_weight: 1
Unfortunately, even with this, I still could not replicate the Potsdam results listed in the paper:
At this point, I think there is something more fundamentally broken somewhere in STEGO related to Potsdam, perhaps in the dataset as a bug or elsewhere.
Thanks for replicating this @BradNeuberg, this might be something related to the specifics of your distributed training setup. How many workers do you use and are you using same batch size? These models were trained on a single GPU so this might have affected training.
I am using Google Cloud, with the machine type being an n1-standard-8 with 8 CPU cores and a V-100 GPU. Since I have 8 CPU cores, I could potentially set num_workers to 8; however, I consistently get out of memory errors at about epoch 22 if I do that, so I've set the num_workers to 1, which gets rid of out of memory errors. My batch size is 32. I'm only using a single machine and a single GPU for training.
Hi @BradNeuberg ,
Will you show some example how did you use Bayesian hyperparameter optimizer, SigOpt to optimize the hyperparameters for STEGO model?
How to deal with the problem about potsdam repulicating?
@mhamilton723 ,could you share the hyparams about postdam?
Hi folks, congrats on the great paper! To add to the discussion, I'd like to share that we are publishing a follow-up study on STEGO in CVPR 23 Workshops, which also looks into the issues you describe. Figure 4 might be interesting to you! :) Cheers, Alex
@mhamilton723 ,could you share the hyparams about postdam?
@Cemm23333 you can find them here: https://arxiv.org/abs/2304.07314
Could you help to reproduce the results with the Potsdam dataset? I trained STEGO with the same configuration used in
potsdam_test.ckpt
and then evaluate the model usingeval_segmentation.py
, but Accuracy and IoU of clustering are low. Usingpotsdam_test.ckpt
, I gotbut, using my checkpoint, I got
The results with the linear probe look good, but not the one with the cluster. Could you help to figure out what can make the difference?
Here is my configuration used to train STEGO: