Closed kevinsbarnard closed 11 months ago
This also applies for different-resolution framegrabs. We also cannot assume the difference in dimensions is purely in scale; the box dimension transform can have a constant offset. For example, with letterboxed framegrabs, the bounding box will need to be offset before a scaling is applied to its dimensions.
In other words, we're currently doing:
\begin{align}
x_{\text{target box}} &= s_x x_{\text{source box}} \\
y_{\text{target box}} &= s_y y_{\text{source box}} \\
w_{\text{target box}} &= s_x w_{\text{source box}} \\
h_{\text{target box}} &= s_y h_{\text{source box}}
\end{align}
where
\begin{align}
s_x &= \frac{w_{\text{target frame}}}{w_{\text{source frame}}} \\
s_y &= \frac{h_{\text{target frame}}}{h_{\text{source frame}}}
\end{align}
But, with a letterbox on the source frame, we would need:
\begin{align}
x_{\text{target box}} &= s_x (x_{\text{source box}} - w_{\text{source letterbox}}) \\
y_{\text{target box}} &= s_y (y_{\text{source box}} - h_{\text{source letterbox}}) \\
w_{\text{target box}} &= s_x w_{\text{source box}} \\
h_{\text{target box}} &= s_y h_{\text{source box}}
\end{align}
where
\begin{align}
s_x &= \frac{w_{\text{target frame}}}{w_{\text{source frame}} - 2 w_{\text{source letterbox}}} \\
s_y &= \frac{h_{\text{target frame}}}{h_{\text{source frame}} - 2 h_{\text{source letterbox}}}
\end{align}
As discussed in VARS meeting 11/15/2023, we are ignoring the letterbox case
Fixed in v0.5.0; localizations are now rescaled to the source video dimensions in Sharktopoda 2. I've also added a warning that will appear when the source/target aspect ratios are different to help address the case above.
Currently, the bounding box dimensions are used as-is when fed into Sharktopoda 2. But, we may be using a different-resolution target video relative to the resolution of the source video/image. We need to rescale the bounding box appropriately.