Open bhack opened 1 year ago
Hi, thank you for pointing out the issue. Could you please give an image for example? where does the 0.5px/1px misaligment bias happen?
I will try to find an example on DAVIS to share. In the meantime have you already experienced something like this?
@yoxu515 @z-x-yang Just to give more evidence to this effect I've replicated the same frame (0) from DAVIS speed-skating 100 times.
Here the original annotation (frame 0/GT):
Frame 0 after 50 propagation:
Frame 0 after 99 propagation:
Down/Right accumulated Drift 0-10:
Down/Right accumul ated Drift 0-50:
Down/Right accumulated Drift 0-99:
Any feedback on this?
@yoxu515 There is a rounding error in the eval engine and the related interpolations as this happen and it is reproducible when input W or H is not divisible by 16.
@z-x-yang Other then some edge case precision about the max_stride alignment you are also affected by https://github.com/pytorch/pytorch/issues/34808.
@yoxu515 There is a rounding error in the eval engine and the related interpolations as this happen and it is reproducible when input W or H is not divisible by 16.
Thank you for your ongoing attention to this issue. To be honest, at this point, I am also still trying to understand why this misalignment is occurring. Perhaps, as @s-deeper commented, the nearest interpolation in PyTorch could introduce misalignment, leading to this situation. However, AOT should be able to learn how to eliminate this misalignment during training, unless there is a lack of strict alignment between the training and testing settings.
As far as I remember, the handling of mask interpolation in both the training and testing processes of AOT should be consistent. In any case, I will pay closer attention to this issue. Thank you!
@yoxu515 There is a rounding error in the eval engine and the related interpolations as this happen and it is reproducible when input W or H is not divisible by 16.
Furthermore, has this kind of misalignment caused any difficulties for you in the actual use of DeAOT? If not, I don't believe it's a critical issue.
Indeed, I have also noticed in some early versions of DeAOT experiments that when the video frame rate is very high, and the target remains stationary, there is some weird drift in the segmentation mask. However, in the released versions of DeAOT on YouTube-VOS and DAVIS, this issue does not arise (though I am not sure if it persists in videos with even higher frame rates or smaller object movements).
@z-x-yang Other then some edge case precision about the max_stride alignment you are also affected by pytorch/pytorch#34808.
Thank you for pointing out the bug in PyTorch that I had not previously noticed! I will review the relevant code and strive to prevent all unexpected misalignments.
You can reproduce exactly with the current eval code:
Partially it is solved for sure by https://github.com/pytorch/pytorch/issues/34808#issuecomment-1007806783
But I think you have residual edge case/side effect using np.around
:
https://github.com/yoxu515/aot-benchmark/blob/ada8a3cbf0ba6dde563a49e78e56dbbcde01d143/dataloaders/video_transforms.py#L640-L655
Can you increase the precision there?
Also there is another issue in training: https://github.com/pytorch/pytorch/issues/104157
You need to use something like: https://github.com/huggingface/transformers/pull/28504/files#r1455033425
Have you found on your experiment runs a 0.5px/1px misaligment bias in right/bottom-right direction? I have noted this both with aligned and not aligned corners models that you have used (e.g. R50/Swin Deaotl). As these kind of errors are very hard to debug I want to know if you have experienced something like this on your side.
Thanks.