Thanks for sharing! why only use 1 batch during training?

walsvid / Pixel2MeshPlusPlus

Pixel2Mesh++: Multi-View 3D Mesh Generation via Deformation. In ICCV2019.

https://arxiv.org/abs/1908.01491

BSD 3-Clause "New" or "Revised" License

354 stars 56 forks source link

Thanks for sharing! why only use 1 batch during training? #2

Closed zhiwenfan closed 4 years ago

zhiwenfan commented 4 years ago

Thanks for sharing this great work! I am new to this field and have one question: why only use 1 batch during training?

walsvid commented 4 years ago

Hello, the use of batch_size=1 is because we continue to use Pixel2Mesh's data processing method. The number of ground truth points obtained by sampling is inconsistent. For simplicity, we use batch_size=1. We have found a new ground truth data processing method, which will be updated later. If we can guarantee the same length of ground truth data, we can increase the batch.

zhiwenfan commented 4 years ago

Hello, the use of batch_size=1 is because we continue to use Pixel2Mesh's data processing method. The number of ground truth points obtained by sampling is inconsistent. For simplicity, we use batch_size=1. We have found a new ground truth data processing method, which will be updated later. If we can guarantee the same length of ground truth data, we can increase the batch.

Thank you for your quick reply! Another question is where can I found the camera intrinsics and extrinsics? the txt file under ShapeNetImages/ShapeNetRendering/02691156/5b985bc192961c595de04aad18bd94c3/rendering/rendering_metadata.txt contains 5 elements each row. However I do not get the meaning of each element.

walsvid commented 4 years ago

Hi, as for intrinsics, the Focal Length we used can be found in our corss-view perceptual-pooling layer, and the Principal Point Offset is 0. As for extrinsics, those 5 numbers are the parameters of extrinsics maxtrix. Since we using the data provided by 3D-R2N2, and it used the az-el coordinate system. The 5 numbers from left to right are Azimuth, Altitude, the camera inclination (all zero), the distance from the camera to the object, and the field of view (FOV, all 25).