Performance on the testing data

LPXTT commented 2 years ago

Hi, I got some models and results by running './scripts/train_voc_aug.sh -l 1323 -g 4 -b 50'. How can I get the testing results on Pascal VOC? Is the valid_Mean_IoU (0.7005) same as testing result? Run summary: global_step 23119 learning_rate_0 1e-05 learning_rate_1 1e-05 loss_sup 0.05151 loss_unsup 0.0 mIoU_labeled 0.932 mIoU_unlabeled 0.619 pixel_acc_labeled 0.98 pixel_acc_unlabeled 0.886 ramp_up 1.0 valid_Mean_IoU 0.7005 valid_Pixel_Accuracy 0.9316

LPXTT commented 2 years ago

Is there any thing I should pay attention to? Because the result is only 70?

yyliu01 commented 2 years ago

Could you please share your hardware settings in here? It would be nice if you can share the wandb page as I did in here.

Also, I'm really confused that how you get 0 in loss_unsup...

LPXTT commented 2 years ago

I used four V100 with 32G memory to train the model. I didn't use wandb before. There are some errors when I want to sync files or login. I saw the code created a wandb dir, which contains some log files. Can I send them to you if they are useful?

LPXTT commented 2 years ago

[PS-MT][WARNING] Training start, 总共 80 epochs [PS-MT][CRITICAL] DGX: Off with [4 x v100] [PS-MT][CRITICAL] GPUs: 4 [PS-MT][CRITICAL] Network Architecture: deeplabv3+, with ResNet 50 backbone [PS-MT][CRITICAL] Current Labeled Example: 1323 [PS-MT][CRITICAL] Learning rate: other 0.01, and head is the SAME [world] [PS-MT][INFO] Image: 512x512 based on 600x600 [PS-MT][INFO] Current batch: 64 [world] [PS-MT][INFO] Current unsupervised loss function: semi_ce, with weight 0.0 and length 12 [PS-MT][INFO] Current config+args: {'name': 'PS-MT(VOC12)', 'experim_name': 'r50', 'n_labeled_examples': 1323, 'ramp_up': 12, 'unsupervised_w': 0.0, 'ignore_index': 255, 'lr_scheduler': 'Poly', 'use_weak_lables': False, 'weakly_loss_w': 0.4, 'model': {'supervised': False, 'semi': True, 'resnet': 50, 'sup_loss': 'CE', 'un_loss': 'semi_ce', 'warm_up_epoch': 5}, 'optimizer': {'type': 'SGD', 'args': {'lr': 0.01, 'weight_decay': 0.0001, 'momentum': 0.9}}, 'train_supervised': {'data_dir': '/mnt/efs/lpx/research/dataset/pascalVOC12/', 'batch_size': 8, 'crop_size': 512, 'shuffle': True, 'base_size': 600, 'scale': True, 'augment': True, 'flip': True, 'rotate': False, 'split': 'train_supervised', 'num_workers': 8}, 'train_unsupervised': {'data_dir': '/mnt/efs/lpx/research/dataset/pascalVOC12/', 'weak_labels_output': 'pseudo_labels/result/pseudo_labels', 'batch_size': 8, 'crop_size': 512, 'shuffle': True, 'base_size': 600, 'scale': True, 'augment': True, 'flip': True, 'rotate': False, 'split': 'train_unsupervised', 'num_workers': 8}, 'val_loader': {'data_dir': '/mnt/efs/lpx/research/dataset/pascalVOC12/', 'batch_size': 1, 'val': True, 'split': 'val', 'shuffle': False, 'num_workers': 4}, 'trainer': {'epochs': 80, 'save_dir': 'saved/', 'save_period': 1, 'log_dir': 'saved/', 'log_per_iter': 20, 'val': True, 'val_per_epochs': 1, 'gamma': 0.5, 'sharp_temp': 0.5}, 'n_gpu': 4, 'nodes': 1, 'batch_size': 8, 'epochs': 80, 'warm_up': 5, 'labeled_examples': 1323, 'learning_rate': 0.0025, 'gpus': 4, 'gcloud': 0, 'local_rank': 0, 'architecture': 'deeplabv3+', 'backbone': 50, 'ddp': True, 'dgx': False, 'semi_p_th': 0.6, 'semi_n_th': 0.0, 'unsup_weight': 0.0, 'world_size': 4} Load model, Time usage: IO: 0.06483221054077148, initialize parameters: 1.532975673675537 Load model, Time usage: IO: 0.0729672908782959, initialize parameters: 1.567183494567871 Load model, Time usage: IO: 0.07271456718444824, initialize parameters: 1.5179588794708252 Load model, Time usage: IO: 0.06479263305664062, initialize parameters: 1.520132064819336 W&B offline, running your script from this directory will only write metadata locally. wandb: Tracking run with wandb version 0.12.21 wandb: W&B syncing is set to offline in this directory. wandb: Run wandb online or set WANDB_MODE=online to enable cloud syncing. [PS-MT][CRITICAL] distributed data parallel training: on ID 1 Warm (0) | Ls 2.01 |: 100%|███████████████████████████████████████████████████████████████████████| 41/41 [00:25<00:00, 1.61it/s] ID 2 Warm (0) | Ls 1.93 |: 100%|███████████████████████████████████████████████████████████████████████| 41/41 [00:18<00:00, 2.24it/s] ID 3 Warm (0) | Ls 1.36 |: 100%|███████████████████████████████████████████████████████████████████████| 41/41 [00:20<00:00, 2.00it/s] ID 1 Warm (1) | Ls 1.53 |: 100%|███████████████████████████████████████████████████████████████████████| 41/41 [00:23<00:00, 1.74it/s] ID 2 Warm (1) | Ls 1.66 |: 100%|███████████████████████████████████████████████████████████████████████| 41/41 [00:19<00:00, 2.13it/s] ID 3 Warm (1) | Ls 0.87 |: 100%|███████████████████████████████████████████████████████████████████████| 41/41 [00:23<00:00, 1.74it/s] ID 1 Warm (2) | Ls 1.56 |: 100%|███████████████████████████████████████████████████████████████████████| 41/41 [00:19<00:00, 2.10it/s] ID 2 Warm (2) | Ls 1.62 |: 100%|███████████████████████████████████████████████████████████████████████| 41/41 [00:21<00:00, 1.90it/s] ID 3 Warm (2) | Ls 0.74 |: 100%|███████████████████████████████████████████████████████████████████████| 41/41 [00:21<00:00, 1.91it/s] ID 1 Warm (3) | Ls 1.56 |: 100%|███████████████████████████████████████████████████████████████████████| 41/41 [00:19<00:00, 2.05it/s] ID 2 Warm (3) | Ls 1.49 |: 100%|███████████████████████████████████████████████████████████████████████| 41/41 [00:21<00:00, 1.90it/s] ID 3 Warm (3) | Ls 0.56 |: 100%|███████████████████████████████████████████████████████████████████████| 41/41 [00:19<00:00, 2.08it/s] ID 1 Warm (4) | Ls 1.46 |: 100%|███████████████████████████████████████████████████████████████████████| 41/41 [00:22<00:00, 1.86it/s] ID 2 Warm (4) | Ls 1.49 |: 100%|███████████████████████████████████████████████████████████████████████| 41/41 [00:25<00:00, 1.62it/s] ID 3 Warm (4) | Ls 0.58 |: 100%|███████████████████████████████████████████████████████████████████████| 41/41 [00:18<00:00, 2.24it/s] ID 1 T (1) | Ls 0.255 Lu 0.000 Lw 0.000 m1 0.724 m2 0.031|: 100%|████████████████████████████████████| 289/289 [14:03<00:00, 2.92s/it] [PS-MT][INFO] evaluating ... EVAL ID (Teachers) (1) | Loss: 1.3726, PixelAcc: 0.7327, Mean IoU: 0.0350 |: 100%|████████████| 1449/1449 [03:49<00:00, 6.33it/s]

LPXTT commented 2 years ago

I may find the problem after reading the log file. The weight of unsupervised loss is 0.

yyliu01 commented 2 years ago

No worries at all! Please make it to 1.5 as default and re-run the approach.

May I kindly ask whether my script causes the bug or not?

LPXTT commented 2 years ago

Thank you for your help! Got it! Sorry about that. I forgot I changed the loss weight.

yyliu01 commented 2 years ago

You are welcome. Please re-open the issue if your result cannot achieve the reported performance.

yyliu01 / PS-MT

Performance on the testing data #6