seoungwugoh / STM

Video Object Segmentation using Space-Time Memory Networks
405 stars 81 forks source link

Questions about YouTubeVOS training #30

Open xmlyqing00 opened 4 years ago

xmlyqing00 commented 4 years ago

Hi! Thanks for your great job. I have three questions on YouTube-VOS dataset.

  1. Do you use training videos from YouTubeVOS-2018 or YouTubeVOS-2019?
  2. Do you train the model with full frames or sampled frames?
  3. You mentioned in issue 6 that you use random resized and crop for data augmentation. For a given input frame (most are 720x1280), resize the short side in a random length from 384 to original length (720), then resize the long side to keep frame aspect. Then randomly crop a (384 x 384) area. You also apply different zoom ratios from 0.9 to 1.1 on height and width independently. Correct me if I am wrong. I wonder whether such procedure is equivalent to the RandomResizedCrop function in torchvision.