mhamilton723 / STEGO

Unsupervised Semantic Segmentation by Distilling Feature Correspondences
MIT License
712 stars 143 forks source link

ViT-Base performance reproduction problem on cocostuff27 #3

Closed MY-LIU100101 closed 2 years ago

MY-LIU100101 commented 2 years ago

Thank you so much for sharing this wonderful job. However, I am facing difficulties on reproducing the performance of the ViT-Base model.

I got the following performances after retraining the ViT-Base 5-crop setting: {'final/linear/mIoU': 34.91545915603638, 'final/linear/Accuracy': 73.63219261169434, 'final/cluster/mIoU': 19.5334792137146, 'final/cluster/Accuracy': 50.60913562774658}

However, I think it should be (the performance of your released ViT-Base model): {'final/linear/mIoU': 41.074731945991516, 'final/linear/Accuracy': 76.12890005111694, 'final/cluster/mIoU': 28.187400102615356, 'final/cluster/Accuracy': 56.92926645278931}.

I also re-trained your ViT-Small model with 5-crop data setting, which is about 23.67 mIoU, a little bit lower than your reported performance 24.5. But, I think that is fine.

However, when I was trying to reproduce your ViT-base performance, it is just 19.53. I changed the model type and using the "Cocostuff27 10/3 vit_base" weights. Did I miss something?

The configs for the training is attached below:

#################### training configs ###################### num_workers: 24 max_steps: 5000 num_neighbors: 7

batch_size: 16 dataset_name: "cocostuff27" crop_type: "five" #~ crop_ratio: .5 res: 224 loader_crop_type: "center"

extra_clusters: 0 use_true_labels: False use_recalibrator: False model_type: "vit_base" arch: "dino" use_fit_model: False dino_feat_type: "feat" projection_type: "nonlinear"

projection_type: linear

dino_patch_size: 8 granularity: 1 continuous: True dim: 70 dropout: True zero_clamp: True

lr: 5e-4 pretrained_weights: ~ use_salience: False stabalize: False stop_at_zero: True

pointwise: True feature_samples: 11 neg_samples: 5 aug_alignment_weight: 0.0 correspondence_weight: 1.0

######################Cocostuff27 10/3 vit_base neg_inter_weight: 0.1538476246415498 pos_inter_weight: 1 pos_intra_weight: 0.1

neg_inter_shift: 1 pos_inter_shift: 0.2 pos_intra_shift: 0.12

mhamilton723 commented 2 years ago

Thanks for raising this @MY-LIU100101, ill be looking into this in the next few weeks as I make this code easier to train locally! Currently, the easiest way to repro our results is to use the download_model.py script and evaluate with the eval_segmentation.py script. This should yield the results in the table.

mhamilton723 commented 2 years ago

Just made sure eval_segmentation.py works as reported without memory issue. Feel free to make a new issue if needed!