mhamilton723 / STEGO

Unsupervised Semantic Segmentation by Distilling Feature Correspondences
MIT License
712 stars 143 forks source link

Reproduce the Potsdam results #39

Open iseong83 opened 2 years ago

iseong83 commented 2 years ago

Could you help to reproduce the results with the Potsdam dataset? I trained STEGO with the same configuration used in potsdam_test.ckpt and then evaluate the model using eval_segmentation.py, but Accuracy and IoU of clustering are low. Using potsdam_test.ckpt, I got

'final/linear/mIoU': 74.83345866203308, 
'final/linear/Accuracy': 85.84609031677246,
'final/cluster/mIoU': 62.565261125564575, 
'final/cluster/Accuracy': 77.03110575675964

but, using my checkpoint, I got

'final/linear/mIoU': 74.89467859268188, 
'final/linear/Accuracy': 85.89659333229065, 
'final/cluster/mIoU': 47.732433676719666, 
'final/cluster/Accuracy': 64.23421502113342

The results with the linear probe look good, but not the one with the cluster. Could you help to figure out what can make the difference?

Here is my configuration used to train STEGO:

output_root: ../
pytorch_data_dir: /home/bv/datasets/external_datasets
experiment_name: exp1
log_dir: potsdam
azureml_logging: true
submitting_to_aml: false
num_workers: 24
max_steps: 5000
batch_size: 16
num_neighbors: 7
dataset_name: potsdam
dir_dataset_name: null
dir_dataset_n_classes: 5
has_labels: false
crop_type: null
crop_ratio: 0.5
res: 224
loader_crop_type: center
extra_clusters: 0
use_true_labels: false
use_recalibrator: false
model_type: vit_small
arch: dino
use_fit_model: false
dino_feat_type: feat
projection_type: nonlinear
dino_patch_size: 8
granularity: 1
continuous: true
dim: 70
dropout: true
zero_clamp: true
lr: 0.0005
pretrained_weights: null
use_salience: false
stabalize: false
stop_at_zero: true
pointwise: true
feature_samples: 11
neg_samples: 5
aug_alignment_weight: 0.0
correspondence_weight: 1.0
neg_inter_weight: 0.63
pos_inter_weight: 0.25
pos_intra_weight: 0.67
neg_inter_shift: 0.76
pos_inter_shift: 0.02
pos_intra_shift: 0.08
rec_weight: 0.0
repulsion_weight: 0.0
crf_weight: 0.0
alpha: 0.5
beta: 0.15
gamma: 0.05
w1: 10.0
w2: 3.0
shift: 0.0
crf_samples: 1000
color_space: rgb
reset_probe_steps: null
n_images: 5
scalar_log_freq: 10
checkpoint_freq: 50
val_freq: 100
hist_freq: 100
full_name: potsdam/potsdam_exp1
BradNeuberg commented 2 years ago

For the record I'm seeing exactly the same problem -- I can replicate the STEGO results against the model already trained, but when I train myself I get a lower accuracy for the cluster probe than the paper reports.

BradNeuberg commented 2 years ago

I attempted to train cocostuff to get a successful training run to see what the graphs looked like (https://github.com/mhamilton723/STEGO/issues/23#issuecomment-1186319622). Even with this, though, I could not successfully tune the Potsdam hyperparameters.

I decided to turn to a Bayesian hyperparameter optimizer, SigOpt. I had it run for about 100 times, tuning the various positive and negative hyperparameters, focused on just optimizing cluster mIoU. Technically I should have had it optimize linear accuracy/mIoU and cluster accuracy/mIoU all together, but for simplicity just chose cluster mIoU. It came up with these hyperparameter values for the Potsdam dataset:

Parameters: neg_inter_shift: 0.9981259810906995 neg_inter_weight: 0.19914806514497108 pos_inter_shift: 0.17863135533504992 pos_inter_weight: 0.6098772723430869 pos_intra_shift: 0.003232418118101617 pos_intra_weight: 1

Unfortunately, even with this, I still could not replicate the Potsdam results listed in the paper:

image

At this point, I think there is something more fundamentally broken somewhere in STEGO related to Potsdam, perhaps in the dataset as a bug or elsewhere.

mhamilton723 commented 2 years ago

Thanks for replicating this @BradNeuberg, this might be something related to the specifics of your distributed training setup. How many workers do you use and are you using same batch size? These models were trained on a single GPU so this might have affected training.

BradNeuberg commented 2 years ago

I am using Google Cloud, with the machine type being an n1-standard-8 with 8 CPU cores and a V-100 GPU. Since I have 8 CPU cores, I could potentially set num_workers to 8; however, I consistently get out of memory errors at about epoch 22 if I do that, so I've set the num_workers to 1, which gets rid of out of memory errors. My batch size is 32. I'm only using a single machine and a single GPU for training.

tanveer6715 commented 1 year ago

Hi @BradNeuberg ,

Will you show some example how did you use Bayesian hyperparameter optimizer, SigOpt to optimize the hyperparameters for STEGO model?

Cemm23333 commented 1 year ago

How to deal with the problem about potsdam repulicating?

Cemm23333 commented 1 year ago

@mhamilton723 ,could you share the hyparams about postdam?

axkoenig commented 1 year ago

Hi folks, congrats on the great paper! To add to the discussion, I'd like to share that we are publishing a follow-up study on STEGO in CVPR 23 Workshops, which also looks into the issues you describe. Figure 4 might be interesting to you! :) Cheers, Alex

22by7-raikar commented 1 year ago

@mhamilton723 ,could you share the hyparams about postdam?

@Cemm23333 you can find them here: https://arxiv.org/abs/2304.07314