Closed edwardcho closed 2 years ago
I will double check tonight or tomorrow and get back to you.
Yes.. Thanks.. If you have some opinions for me, please tell me...
Thanks.
Could you maybe provide some more context to your setup? I downloaded the code and data from scratch. Then followed the instructions on the README. Had to install yacs which as a dependency I forgot to mention, but I believe is mentioned in the submodule requirements then started the training.
python3 train.py --gpus 0-1
[2022-01-06 23:14:56,706 INFO train.py line 240 1846380] Loaded configuration file config/ade20k-resnet50dilated-ppm_deepsup.yaml
[2022-01-06 23:14:56,706 INFO train.py line 241 1846380] Running with config:
DATASET:
imgMaxSize: 1000
imgSizes: (300, 375, 450, 525, 600)
list_train: ./data/training.odgt
list_val: ./data/validation.odgt
num_class: 150
padding_constant: 8
random_flip: True
root_dataset: ./data/
segm_downsampling_rate: 8
DIR: ckpt/ade20k-resnet50dilated-ppm_deepsup
MODEL:
arch_decoder: ppm_deepsup
arch_encoder: resnet50dilated
fc_dim: 2048
weights_decoder:
weights_encoder:
OOD:
exclude_back: False
ood: msp
out_labels: (13,)
TEST:
batch_size: 1
checkpoint: epoch_20.pth
result: ./
TRAIN:
batch_size_per_gpu: 2
beta1: 0.9
deep_sup_scale: 0.4
disp_iter: 20
epoch_iters: 5000
fix_bn: False
lr_decoder: 0.02
lr_encoder: 0.02
lr_pow: 0.9
num_epoch: 20
optim: SGD
seed: 304
start_epoch: 0
weight_decay: 0.0001
workers: 16
VAL:
batch_size: 1
checkpoint: epoch_20.pth
visualize: False
[2022-01-06 23:14:56,706 INFO train.py line 246 1846380] Outputing checkpoints to: ckpt/ade20k-resnet50dilated-ppm_deepsup
# samples: 5125
1 Epoch = 5000 iters
Epoch: [1][0/5000], Time: 9.62, Data: 2.50, lr_encoder: 0.020000, lr_decoder: 0.020000, Accuracy: 0.66, Loss: 7.690948
Epoch: [1][20/5000], Time: 1.21, Data: 0.16, lr_encoder: 0.019996, lr_decoder: 0.019996, Accuracy: 70.52, Loss: 2.431588
Epoch: [1][40/5000], Time: 0.96, Data: 0.10, lr_encoder: 0.019993, lr_decoder: 0.019993, Accuracy: 76.97, Loss: 1.634942
It seems to be training successfully.
Will close soon unless I get updated with more information. Otherwise I cannot reproduce your issue.
Facing the same issue, have u BEBUG it yet?
@LT1st can you describe your steps or what you did?
It seems the issue is with training with one GPU. I'll update the Readme and previous issue. The solution is something along these lines https://github.com/CSAILVision/semantic-segmentation-pytorch/issues/58 But I haven't tested single GPU support and not sure when I'll be able to test it. Maybe sometime next week.
Hello Sir,
Still, I couldn't solve my error. I am using your code and config/ade20k-resnet50dilated-ppm_deepsup.yaml, streethazards_train.tar.
When training was started,
I wonder what is my fault?? Thanks, Edward Cho.