Overfitting due to camera rotation variation

resclify commented 3 years ago

Hello,

I have tried out the code with the provided training and test data "modelnet40_images_new_12x". I have trained my model with the default parameters in the README file:

python train_mvcnn.py -name mvcnn -num_models 1000 -weight_decay 0.001 -num_views 12 -cnn_name vgg11

Like many others I got a 94.2% overall accuracy with the second stage. After that I tried rendering the dataset myself with the provided blender script. As stated in another issues thread I didn't get exactly the same render results as in modelnet40_images_new_12x, because it's not the original blender file. With the rerendered dataset I only archived 91.5% overall accuracy. So I compared the images and noticed that not all classes have the same amount of differences. It seems that the classes are rendered with different camera angles. More precisely it is the rotation around the Y-axis that varies.

From my comparison it seems that e.g. the desks are rendered with ~180.8° and tables are rendered with 181.308°. In fact all objects in alphabetical order from plant to xbox are rendered with 181.308°. (The angles are >180° because the object and the camera are both upside down in blender coordinate system, in case anyone is wondering.) I have made several tests and my conclusion is that the network actually considers the camera rotation in some way. Here are two demonstrations for that behavior:

1.) When varying the rotation in the rerendered test data, the orignal model (trained with original modelnet40_images_new_12x data) reacts highly sensitive:

180°    71.2%
180.8°  90.1%
185°    80.1%

When the model is trained with the rerendered dataset (with equal rotation) it's not sensitive at all. The result is always 91.5%.

2.) Desks and tables are very similar objects in the dataset from the start. The model trained with original (modelnet40_images_new_12x) data still gets desks with 96% and tables with 95% right. When using again the original model and changing the angle for the whole test dataset we get:

        desk    table
180.8°      95.3%   82%
181.308°    40.6%   93%

This further indicates that desk is rendered with 180.8° and table with 181.308° in the original training data and the network expects that in the test data as well. As expected desk is false classified as table + tv_stand and table is false classified as desk. Similar confusion happens with bookshelf and tv_stand or cup and vase because of the same angles.

Do you have any idea why the camera rotation change happened in the first place? It's weird that the changes happen to be on a per-class basis instead of a per-object basis.

jongchyisu commented 3 years ago

Thanks for your detailed analysis! I am still not sure why the slight rendering difference can cause such big differences. I'm wondering how you get the camera angle from Blender. Could you share how you got it? Maybe I can also give it a try. Thanks!

resclify commented 3 years ago

Actually I have not found a good way to get the angles. I just rendered the images with different angles and looked at diff images. Of course this is more like a rough estimation, but it still shows that it's not the same angle. (It is also possible to see the angle difference by counting the pixels/subpixels on objects with long straight edges. But of course it's hard to determine an exact number.)

Here is an example of diff images for desk and table at 180.8° and 181.308° to give you an idea.diff_renders.zip

Here is the simple script I used for generating the diff images and calculating the absolute sum for a numerical value. It's noting more than a simple (absoulte) diff for a whole directory. image_compare.zip

The renders were generated with the blender file from the repository and the camera angle was adjusted manually. I have just adjusted the file names and model rotation to match the "modelnet40_images_new_12x" renders. render_shaded_black_bg.zip

Thanks for your help.

jongchyisu / mvcnn_pytorch

Overfitting due to camera rotation variation #23