orsic / swiftnet

GNU General Public License v3.0
251 stars 54 forks source link

About training on half resolution cityscapes dataset #9

Closed SJTUsuperxu closed 5 years ago

SJTUsuperxu commented 5 years ago

Hi @orsic , nice paper for semantic segmentation. Current I'm trying to reproduce the results in your paper. I notice that you demonstrate the model performance in (512, 1024): 70.2 miou and 134.9 fps。I wonder sth that is not clear in paper:

  1. when training, do you resize the original image (1024,2048) to (512, 1024) first and then crop square (448, 448) patches for train ???
  2. when testing, you also resize your image to (512, 1024) and just forward the image in your network?? No multi-scale testing is used ??? I'm looking forward to your reply. thx~
lxtGH commented 5 years ago

I have the same question with you

orsic commented 5 years ago

@SJTUsuperxu @lxtGH

  1. Yes. The image is subsampled first, then a random 448x448 patch is cropped.
  2. When testing, it is crucial to upsample the output back to 1024x2048 and evaluate on original labels from the dataset.