Open eduardathome opened 3 years ago
Hi @Edward334, thanks for this detailed report! Yes, the image inpainting network requires that the height and weight should be divided by 8.
One trivial solution is to resize the image first, inpaint the missing region, and resize the image back. This is not ideal but can bypass the error. I'll try to find a better inpainting method.
Is this fixed?
Hi, I ran into this error, I tried to work it out myself and I found the cause but no real solution yet, I'm willing to find a reliable one and share it, if interested. Below I wrote a report detailing what I found:
I. Error and reproductibility:
Error text:
Run into error while running a cloned code from https://github.com/vt-vl-lab/FGVC Version 1.0 with the following command
On a set of 10 images/masks pairs, with the shapes [3, 300, 600]
Error origin:
The functiongrid_sampler(...) is used by bilinear_sampler(...) having the arguments corr and coords_lvl with different shapes [2850, 1, 38, 75] and [2775, 9, 9, 2]
which breaks grid_sampler(...) because of the different dimmensions
In the next 3 chapters I follow both objects trail to find why they have different shapes
I. coords_lvl : CorrBlock.call(self, coords) -> cendroid_lvl
In raft.py the initialize_flow() function computes the size of the grid as being (1, 37, 75) from image with shape (1, 300, 600) because H/8 = 37.5 and H//8 = 37
This propgates to corr_fn() -> CorrBlock.call() that receive the coordinates as being : torch.Size([1, 2, 37, 75])
This is used to reshape centroid_lvl in method call(...) to [2775, 1, 1, 2]), which is then used in bilinear_sampler() with its shape being torch.Size([2775, 1, 1, 2]), which in turn gives the shape to coords_lvl
II. corr <- fmaps:
corr eventually takes its shape from fmaps, as detailed in ch.III
Fmaps are generated (in this case) using a BasicEncoder(nn.Module). Looking at the forward(self, x) method,
It returns a feature map from an image, by passing it through different nn layers, with its shape at the end being exactly torch.Size([256, 38, 75]). This shape is propagates as described in chapter III.
III. fmap1, fmap2 -> CorrBlock.init(self, fmap1, fmap2, num_levels=4, radius=4) -> corr.shape
In raft.py, method self.fnet([image1, image2]) return fmap1 and fmap2, with shapes torch.Size([1, 256, 38, 75])
This propagates to CorrBlock.init() to corr object with shape torch.Size([2850, 1, 38, 75]) where 2850=38*75
It is then appended to self.corr_pyramid, to finally be used in the call(), in bilinear_sampler() with its shape being torch.Size([2850, 1, 1, 2])
IV. Possible solutions:
To match the same shape, either the small CNN must be modified, or the way the grid shape its defined in initialize_flow(), from:
to:
However, I suspect this change should be made at other points in the implementation as well.
V. Observation
This method of dividing by 8 to match the output shape of the convolutions can raise multiple errors, and should better match exactly the output shape. In case of modifying the architecture of the CNN, this will also throw shape miss-match errors.