hustvl / TopFormer

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation, CVPR2022
Other
373 stars 42 forks source link

Can you explain your test Aug for ade20K? #9

Closed GuoQuanhao closed 2 years ago

GuoQuanhao commented 2 years ago

The mmseg code is too integrated, I would like to know your test aug.

mulinmeng commented 2 years ago

The mmseg code is too integrated, I would like to know your test aug.


def resize(img, img_h, img_w):
h, w = img.shape[:2]
scale_h = img_h / h
scale_w = img_w / w
scale = min(scale_h, scale_w)
new_h, new_w = (h * scale, w * scale)
input_img = cv2.resize(img, (int(new_w), int(new_h)), interpolation=cv2.INTER_LINEAR)
return cv2.copyMakeBorder(input_img, 0,  img_h - int(new_h), 0, img_w - int(new_w), cv2.BORDER_CONSTANT, value = 0)

def prepare_input(image): mean = np.array([123.675, 116.28, 103.53], dtype=np.float32) stdinv = 1 / np.array([58.395, 57.12, 57.375], dtype=np.float32) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) image = (image - mean) * stdinv image = image.transpose(2, 0, 1) return image[np.newaxis,:,:,:]


We use the default data aug in mmseg, when you need to simply run demo on mobile, you can run:

img = cv2.imread(<img_path>)
new_img = resize(img, 512, 512) # or directly resize to (512, 512) w/o padding
inp_data = prepare_input(new_img)
res = Network(inp_data)

When you want to reproduce our results, you need to rescale the short side of the image to 512 and keep the aspect ratio(keep the size of images divisible by 32 simultaneously),then run `prepare_input`.

If you have any other questions, feel free to ask under this issue.
GuoQuanhao commented 2 years ago

I see, for example, I have the raw image is 420x710, so I got input image is 448x768, through model, I got 448x448 pred image, I need to resize the pred to 448x768 and then calculate the mIoU? Did you use the slide inference?

mulinmeng commented 2 years ago

I see, for example, I have the raw image is 420x710, so I got input image is 448x768, through model, I got 448x448 pred image, I need to resize the pred to 448x768 and then calculate the mIoU? Did you use the slide inference?

Q: " I got input image is 448x768, through model, I got 448x448 pred image" A: when you get 448x768 for input, the output size is also 448x768, then you can crop the pred according to padding_size and resize cropped pred to 420x710(raw_size).

Q: Did you use the slide inference? A: No, whole image.

GuoQuanhao commented 2 years ago

OK, I see, thanks