Hi! Thanks for your great job. I have three questions on YouTube-VOS dataset.
Do you use training videos from YouTubeVOS-2018 or YouTubeVOS-2019?
Do you train the model with full frames or sampled frames?
You mentioned in issue 6 that you use random resized and crop for data augmentation.
For a given input frame (most are 720x1280), resize the short side in a random length from 384 to original length (720), then resize the long side to keep frame aspect. Then randomly crop a (384 x 384) area. You also apply different zoom ratios from 0.9 to 1.1 on height and width independently.
Correct me if I am wrong. I wonder whether such procedure is equivalent to the RandomResizedCrop function in torchvision.
Hi! Thanks for your great job. I have three questions on YouTube-VOS dataset.