yyliu01 / PS-MT

[CVPR'22] Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation
https://arxiv.org/pdf/2111.12903.pdf
MIT License
186 stars 17 forks source link

About training #10

Closed Yan1026 closed 2 years ago

Yan1026 commented 2 years ago

I run with ./scripts/train_voc_aug.sh -l 1323 -g 4 -b 101 but get error:

ID 3 Warm (4) | Ls 0.51 |: 98%|█████████████████████████████████████████████████████████████████████▎ | 40/41 [01:04<00:00, 1.18it/s] ID 3 Warm (4) | Ls 0.51 |: 98%|█████████████████████████████████████████████████████████████████████▎ | 40/41 [01:14<00:00, 1.18it/s] ID 3 Warm (4) | Ls 0.51 |: 100%|███████████████████████████████████████████████████████████████████████| 41/41 [01:14<00:00, 3.65s/it] ID 3 Warm (4) | Ls 0.51 |: 100%|███████████████████████████████████████████████████████████████████████| 41/41 [01:14<00:00, 1.81s/it]

0%| | 0/289 [00:00<?, ?it/s]wandb: Network error (ConnectionError), entering retry loop.

How can I solve this problem?Can I run without wandb?

yyliu01 commented 2 years ago

It doesn't need to be fixed. Wandb sometimes encounters network traffic issues, but it will be reconnected automatically and will not affect the training.

You can run w/o wandb, just simply set it to be "offline".

Yan1026 commented 2 years ago

Sorry to bother you,but I run ‘wandb offline’ then get ‘W&B offline, running your script from this directory will only write metadata locally.’

I set it to be ‘offline’,but out file still show: 0%| | 0/289 [00:00<?, ?it/s]wandb: Network error (ConnectionError), entering retry loop.

yyliu01 commented 2 years ago

That is so wired, I didn't meet this issue before.

Normally, wandb will reconnect to the network automatically, and the training will not be effected. May I ask that whether your training is interrupted by this issue or not?

Also, please add

os.environ['WANDB_START_METHOD'] = 'fork'

on top of this line.

If it still not works, please comment all the wandb related functions in both "main.py" and "train.py".

Cheers, Yuyuan

Yan1026 commented 2 years ago

Thank you for reply,I can't use wandb because the server can't connect to the Internet when it's training.

It is strange that the out-file stop update but process is still running.(I run 'nvidia-smi',the process is still running)

Thank you for your advice!

yyliu01 commented 2 years ago

Happy to help!