Closed dasmehdix closed 3 years ago
I build the "DDRNet23 Slim" model that you provided. I have images with shape of 1024 H x 1024 W x 3 C. There are 8 different classes in my dataset. When I check model summary with
summary(net.cuda(),(3,1024,1024))
, I get model summary like:---------------------------------------------------------------- Layer (type) Output Shape Param # ================================================================ Conv2d-1 [-1, 32, 512, 512] 896 BatchNorm2d-2 [-1, 32, 512, 512] 64 ReLU-3 [-1, 32, 512, 512] 0 Conv2d-4 [-1, 32, 256, 256] 9,248 BatchNorm2d-5 [-1, 32, 256, 256] 64 ReLU-6 [-1, 32, 256, 256] 0 Conv2d-7 [-1, 32, 256, 256] 9,216 BatchNorm2d-8 [-1, 32, 256, 256] 64 ReLU-9 [-1, 32, 256, 256] 0 Conv2d-10 [-1, 32, 256, 256] 9,216 BatchNorm2d-11 [-1, 32, 256, 256] 64 ReLU-12 [-1, 32, 256, 256] 0 BasicBlock-13 [-1, 32, 256, 256] 0 Conv2d-14 [-1, 32, 256, 256] 9,216 BatchNorm2d-15 [-1, 32, 256, 256] 64 ReLU-16 [-1, 32, 256, 256] 0 Conv2d-17 [-1, 32, 256, 256] 9,216 BatchNorm2d-18 [-1, 32, 256, 256] 64 BasicBlock-19 [-1, 32, 256, 256] 0 ReLU-20 [-1, 32, 256, 256] 0 Conv2d-21 [-1, 64, 128, 128] 18,432 BatchNorm2d-22 [-1, 64, 128, 128] 128 ReLU-23 [-1, 64, 128, 128] 0 Conv2d-24 [-1, 64, 128, 128] 36,864 BatchNorm2d-25 [-1, 64, 128, 128] 128 Conv2d-26 [-1, 64, 128, 128] 2,048 BatchNorm2d-27 [-1, 64, 128, 128] 128 ReLU-28 [-1, 64, 128, 128] 0 BasicBlock-29 [-1, 64, 128, 128] 0 Conv2d-30 [-1, 64, 128, 128] 36,864 BatchNorm2d-31 [-1, 64, 128, 128] 128 ReLU-32 [-1, 64, 128, 128] 0 Conv2d-33 [-1, 64, 128, 128] 36,864 BatchNorm2d-34 [-1, 64, 128, 128] 128 BasicBlock-35 [-1, 64, 128, 128] 0 ReLU-36 [-1, 64, 128, 128] 0 Conv2d-37 [-1, 128, 64, 64] 73,728 BatchNorm2d-38 [-1, 128, 64, 64] 256 ReLU-39 [-1, 128, 64, 64] 0 Conv2d-40 [-1, 128, 64, 64] 147,456 BatchNorm2d-41 [-1, 128, 64, 64] 256 Conv2d-42 [-1, 128, 64, 64] 8,192 BatchNorm2d-43 [-1, 128, 64, 64] 256 ReLU-44 [-1, 128, 64, 64] 0 BasicBlock-45 [-1, 128, 64, 64] 0 Conv2d-46 [-1, 128, 64, 64] 147,456 BatchNorm2d-47 [-1, 128, 64, 64] 256 ReLU-48 [-1, 128, 64, 64] 0 Conv2d-49 [-1, 128, 64, 64] 147,456 BatchNorm2d-50 [-1, 128, 64, 64] 256 BasicBlock-51 [-1, 128, 64, 64] 0 ReLU-52 [-1, 64, 128, 128] 0 Conv2d-53 [-1, 64, 128, 128] 36,864 BatchNorm2d-54 [-1, 64, 128, 128] 128 ReLU-55 [-1, 64, 128, 128] 0 Conv2d-56 [-1, 64, 128, 128] 36,864 BatchNorm2d-57 [-1, 64, 128, 128] 128 ReLU-58 [-1, 64, 128, 128] 0 BasicBlock-59 [-1, 64, 128, 128] 0 Conv2d-60 [-1, 64, 128, 128] 36,864 BatchNorm2d-61 [-1, 64, 128, 128] 128 ReLU-62 [-1, 64, 128, 128] 0 Conv2d-63 [-1, 64, 128, 128] 36,864 BatchNorm2d-64 [-1, 64, 128, 128] 128 BasicBlock-65 [-1, 64, 128, 128] 0 ReLU-66 [-1, 64, 128, 128] 0 Conv2d-67 [-1, 128, 64, 64] 73,728 BatchNorm2d-68 [-1, 128, 64, 64] 256 ReLU-69 [-1, 128, 64, 64] 0 Conv2d-70 [-1, 64, 64, 64] 8,192 BatchNorm2d-71 [-1, 64, 64, 64] 128 ReLU-72 [-1, 128, 64, 64] 0 Conv2d-73 [-1, 256, 32, 32] 294,912 BatchNorm2d-74 [-1, 256, 32, 32] 512 ReLU-75 [-1, 256, 32, 32] 0 Conv2d-76 [-1, 256, 32, 32] 589,824 BatchNorm2d-77 [-1, 256, 32, 32] 512 Conv2d-78 [-1, 256, 32, 32] 32,768 BatchNorm2d-79 [-1, 256, 32, 32] 512 ReLU-80 [-1, 256, 32, 32] 0 BasicBlock-81 [-1, 256, 32, 32] 0 Conv2d-82 [-1, 256, 32, 32] 589,824 BatchNorm2d-83 [-1, 256, 32, 32] 512 ReLU-84 [-1, 256, 32, 32] 0 Conv2d-85 [-1, 256, 32, 32] 589,824 BatchNorm2d-86 [-1, 256, 32, 32] 512 BasicBlock-87 [-1, 256, 32, 32] 0 ReLU-88 [-1, 64, 128, 128] 0 Conv2d-89 [-1, 64, 128, 128] 36,864 BatchNorm2d-90 [-1, 64, 128, 128] 128 ReLU-91 [-1, 64, 128, 128] 0 Conv2d-92 [-1, 64, 128, 128] 36,864 BatchNorm2d-93 [-1, 64, 128, 128] 128 ReLU-94 [-1, 64, 128, 128] 0 BasicBlock-95 [-1, 64, 128, 128] 0 Conv2d-96 [-1, 64, 128, 128] 36,864 BatchNorm2d-97 [-1, 64, 128, 128] 128 ReLU-98 [-1, 64, 128, 128] 0 Conv2d-99 [-1, 64, 128, 128] 36,864 BatchNorm2d-100 [-1, 64, 128, 128] 128 BasicBlock-101 [-1, 64, 128, 128] 0 ReLU-102 [-1, 64, 128, 128] 0 Conv2d-103 [-1, 128, 64, 64] 73,728 BatchNorm2d-104 [-1, 128, 64, 64] 256 ReLU-105 [-1, 128, 64, 64] 0 Conv2d-106 [-1, 256, 32, 32] 294,912 BatchNorm2d-107 [-1, 256, 32, 32] 512 ReLU-108 [-1, 256, 32, 32] 0 Conv2d-109 [-1, 64, 32, 32] 16,384 BatchNorm2d-110 [-1, 64, 32, 32] 128 ReLU-111 [-1, 64, 128, 128] 0 Conv2d-112 [-1, 64, 128, 128] 4,096 BatchNorm2d-113 [-1, 64, 128, 128] 128 ReLU-114 [-1, 64, 128, 128] 0 Conv2d-115 [-1, 64, 128, 128] 36,864 BatchNorm2d-116 [-1, 64, 128, 128] 128 ReLU-117 [-1, 64, 128, 128] 0 Conv2d-118 [-1, 128, 128, 128] 8,192 BatchNorm2d-119 [-1, 128, 128, 128] 256 Conv2d-120 [-1, 128, 128, 128] 8,192 BatchNorm2d-121 [-1, 128, 128, 128] 256 Bottleneck-122 [-1, 128, 128, 128] 0 ReLU-123 [-1, 256, 32, 32] 0 Conv2d-124 [-1, 256, 32, 32] 65,536 BatchNorm2d-125 [-1, 256, 32, 32] 512 ReLU-126 [-1, 256, 32, 32] 0 Conv2d-127 [-1, 256, 16, 16] 589,824 BatchNorm2d-128 [-1, 256, 16, 16] 512 ReLU-129 [-1, 256, 16, 16] 0 Conv2d-130 [-1, 512, 16, 16] 131,072 BatchNorm2d-131 [-1, 512, 16, 16] 1,024 Conv2d-132 [-1, 512, 16, 16] 131,072 BatchNorm2d-133 [-1, 512, 16, 16] 1,024 Bottleneck-134 [-1, 512, 16, 16] 0 BatchNorm2d-135 [-1, 512, 16, 16] 1,024 ReLU-136 [-1, 512, 16, 16] 0 Conv2d-137 [-1, 128, 16, 16] 65,536 AvgPool2d-138 [-1, 512, 8, 8] 0 BatchNorm2d-139 [-1, 512, 8, 8] 1,024 ReLU-140 [-1, 512, 8, 8] 0 Conv2d-141 [-1, 128, 8, 8] 65,536 BatchNorm2d-142 [-1, 128, 16, 16] 256 ReLU-143 [-1, 128, 16, 16] 0 Conv2d-144 [-1, 128, 16, 16] 147,456 AvgPool2d-145 [-1, 512, 4, 4] 0 BatchNorm2d-146 [-1, 512, 4, 4] 1,024 ReLU-147 [-1, 512, 4, 4] 0 Conv2d-148 [-1, 128, 4, 4] 65,536 BatchNorm2d-149 [-1, 128, 16, 16] 256 ReLU-150 [-1, 128, 16, 16] 0 Conv2d-151 [-1, 128, 16, 16] 147,456 AvgPool2d-152 [-1, 512, 2, 2] 0 BatchNorm2d-153 [-1, 512, 2, 2] 1,024 ReLU-154 [-1, 512, 2, 2] 0 Conv2d-155 [-1, 128, 2, 2] 65,536 BatchNorm2d-156 [-1, 128, 16, 16] 256 ReLU-157 [-1, 128, 16, 16] 0 Conv2d-158 [-1, 128, 16, 16] 147,456 AdaptiveAvgPool2d-159 [-1, 512, 1, 1] 0 BatchNorm2d-160 [-1, 512, 1, 1] 1,024 ReLU-161 [-1, 512, 1, 1] 0 Conv2d-162 [-1, 128, 1, 1] 65,536 BatchNorm2d-163 [-1, 128, 16, 16] 256 ReLU-164 [-1, 128, 16, 16] 0 Conv2d-165 [-1, 128, 16, 16] 147,456 BatchNorm2d-166 [-1, 640, 16, 16] 1,280 ReLU-167 [-1, 640, 16, 16] 0 Conv2d-168 [-1, 128, 16, 16] 81,920 BatchNorm2d-169 [-1, 512, 16, 16] 1,024 ReLU-170 [-1, 512, 16, 16] 0 Conv2d-171 [-1, 128, 16, 16] 65,536 DAPPM-172 [-1, 128, 16, 16] 0 BatchNorm2d-173 [-1, 128, 128, 128] 256 ReLU-174 [-1, 128, 128, 128] 0 Conv2d-175 [-1, 64, 128, 128] 73,728 BatchNorm2d-176 [-1, 64, 128, 128] 128 ReLU-177 [-1, 64, 128, 128] 0 Conv2d-178 [-1, 8, 128, 128] 520 segmenthead-179 [-1, 8, 128, 128] 0 ================================================================ Total params: 5,695,272 Trainable params: 5,695,272 Non-trainable params: 0 ---------------------------------------------------------------- Input size (MB): 12.00 Forward/backward pass size (MB): 1181.08 Params size (MB): 21.73 Estimated Total Size (MB): 1214.80 ----------------------------------------------------------------
As you see that output layer become a 128 H x 128 W resolution but my labels are 1024H x 1024W shape. I can resize my labels from 1024 pixels to 128 pixel but this cause much loss of pixel information. Is this configuration correct for 1024 pixel input? Is the output necessary to be 128 pixel which is 1/8 scaled form of input? @ydhongHIT
It is necessary to calculate at 1/8 resolution for real-time speed. Most state-of-the-art models also output a 1/8-resolution feature and then it is directly upsampled to the original resolution. Thus, you don't need to downsample the label. On the contrary, you should upsample the output.
@ydhongHIT Thanks for answer. Probably, I am going to use DDRNet for my research & going to cite you.
@ydhongHIT Thanks for answer. Probably, I am going to use DDRNet for my research & going to cite you.
Thanks for your interest on my work.
@ydhongHIT Thanks for answer. Probably, I am going to use DDRNet for my research & going to cite you.
Thanks for your interest on my work.
@ydhongHIT Which loss function you use? I check the paper multiple times but can not observe the loss function. I know you used "main loss + aux loss" but I wonder the loss function type that you guys use. Jaccard maybe?
@ydhongHIT Which loss function you use? I check the paper multiple times but can not observe the loss function. I know you used "main loss + aux loss" but I wonder the loss function type that you guys use. Jaccard maybe?
I use the cross-entropy loss, which is mentioned in the paper "The final loss which is sum of cross-entropy can be expressed as:".
I build the "DDRNet23 Slim" model that you provided. I have images with shape of 1024 H x 1024 W x 3 C. There are 8 different classes in my dataset. When I check model summary with
summary(net.cuda(),(3,1024,1024))
, I get model summary like:As you see that output layer become a 128 H x 128 W resolution but my labels are 1024H x 1024W shape. I can resize my labels from 1024 pixels to 128 pixel but this cause much loss of pixel information. Is this configuration correct for 1024 pixel input? Is the output necessary to be 128 pixel which is 1/8 scaled form of input? @ydhongHIT