taconite / arah-release

[ECCV 2022] ARAH: Animatable Volume Rendering of Articulated Human SDFs
https://neuralbodies.github.io/arah/
MIT License
182 stars 15 forks source link

ZJU-mocap data Image Size #5

Closed sunwonlikeyou closed 1 year ago

sunwonlikeyou commented 1 year ago

As I know, original resolution of ZJU-mocap data is (1024,1024). But your code assumes that resolution is (512,512). To reproduce with original dataset, Should I change the resolution of original data & camera parameters(intrinsic) ??

taconite commented 1 year ago

The baselines (neural body, animatable nerf) use resized 512x512 images as inputs, please see here for example https://github.com/zju3dv/neuralbody/blob/333026fc12f33d5e732008e7094b442a0095a1e2/configs/zju_mocap_exp/latent_xyzc_313.yaml#L76

This is also evidenced by the fact that the rendering results they provided are all 512x512

For the 4-view setup, we just followed the same practice. If you want to use original image size (1024x1024), you can set data.high_res to true in the config file - for reference you can check our monocular model config files (e.g. ZJUMOCAP-377-mono_4gpus.yaml) which uses 1024x1024 images as inputs.

Note: there are currently some bugs in the code for resizing images and recomputing intrinsic for non-square images with zju_mocap.py, I will fix this soon, but before that please be aware of this issue if you want to manipulate image resolution and instrinsics on your own data.

sunwonlikeyou commented 1 year ago
File "/codes/arah-release/im2mesh/data/zju_mocap.py", line 414, in __getitem__
fg_inds = np.random.choice(valid_inds.shape[0], size=self.num_fg_samples, replace=False)
  File "mtrand.pyx", line 965, in numpy.random.mtrand.RandomState.choice
ValueError: Cannot take a larger sample than population when 'replace=False'

Thank you for reply. The reason why I asked you was error above. I thought it was related to image resolution. I manipulated image resolution and intrinsics but that error is going on.

fg_inds = np.random.choice(valid_inds.shape[0], size=self.num_fg_samples, replace=False)

I changed zju_mocap.py line 394, Is there any problem??

if valid_inds.shape[0] < self.num_fg_samples:
    fg_inds = np.random.choice(valid_inds.shape[0], size=self.num_fg_samples, replace=True)
else:
    fg_inds = np.random.choice(valid_inds.shape[0], size=self.num_fg_samples, replace=False)
taconite commented 1 year ago
File "/codes/arah-release/im2mesh/data/zju_mocap.py", line 414, in __getitem__
fg_inds = np.random.choice(valid_inds.shape[0], size=self.num_fg_samples, replace=False)
  File "mtrand.pyx", line 965, in numpy.random.mtrand.RandomState.choice
ValueError: Cannot take a larger sample than population when 'replace=False'

Thank you for reply. The reason why I asked you was error above. I thought it was related to image resolution. I manipulated image resolution and intrinsics but that error is going on.

fg_inds = np.random.choice(valid_inds.shape[0], size=self.num_fg_samples, replace=False)

It seems like there are not enough foreground pixels (less than self.num_fg_samples) in the image. This usually happens if you set the camera parameters wrong, such that the 3D bounding box of the SMPL mesh is out-of-frame.

I changed zju_mocap.py line 394, Is there any problem??

if valid_inds.shape[0] < self.num_fg_samples:
    fg_inds = np.random.choice(valid_inds.shape[0], size=self.num_fg_samples, replace=True)
else:
    fg_inds = np.random.choice(valid_inds.shape[0], size=self.num_fg_samples, replace=False)

The logic of the code is as follows: First, valid_inds (N x 2) indicates the 2D indices of foreground pixels on the image plane - foreground pixels are obtained by projecting the 3D bounding box of the SMPL mesh onto the image plane. This part follows neural body/animatable nerf implementations.

self.num_fg_samples specifies the number of foreground rays/pixels we want to sample for each training batch. By default, it is 1024. If the number of foreground pixels in the image is less than the number of rays/pixels to sample, then we have to sample with replacement otherwise it will raise the error you saw. However, if this really happens, it usually means that the image contains a very small portion of the person or even no person, thus the person bounding box projected onto the image plane has very few or 0 pixels.

sunwonlikeyou commented 1 year ago

That means SMPL result was wrong. But the SMPL result isn't wrong And below is the that SMPL result. 000083

Does segmentation result related?? and these are mask & mask_erode which are processed from self.get_mask() mask

mask_erode_

taconite commented 1 year ago

That means SMPL result was wrong. But the SMPL result isn't wrong And below is the that SMPL result. 000083

No, this means the camera parameters are wrong (either intrinsic or extrinsic, or both), such that the field of view of the camera does not contain SMPL mesh.

Does segmentation result related?? and these are mask & mask_erode which are processed from self.get_mask() mask

mask_erode_

We first determine the 2D region where we sample pixels/rays from - this 2D region is the projected area of the 3D bounding box of the SMPL mesh on the image plane. Up until here, it has nothing to do with the foreground human segmentation masks.

Within this 2D region, we use the human segmentation masks (provided along with the dataset) to sample foreground/background pixels/rays. Specifically for the images you posted, mask is in the range [0, 255], whereas mask_erode is in the range [0, 1] (reference here) thus it looks all black.

Can you share your modified files to me so that I can have a look?

On the other hand, if you just want to use the original resolution of the ZJU-MoCap dataset, simply undo all your modifications and set the flag data.high_res to true in your config file.

sunwonlikeyou commented 1 year ago

Can you share your modified files to me so that I can have a look? I back it up with your original code.

But there is in line268, data/zju_mocap.py

center_img = np.array([512.0, 512.0], dtype=np.float32)

Anyway I set the flag true but also It has same error with original code.

taconite commented 1 year ago

The center_img at line 268 specifies the center of data augmentation - which is just resizing for now. The crop_new function handles data augmentation and ideally, it should support flipping/rotation/scaling; it is borrowed and modified from SPIN and used for some of my previous projects. However, we don't really need flipping/rotation/scaling in volume rendering setup, and crop_new doesn't really work if the image is non-square. So I plan to simplify the pipeline and remove crop_new from the code base, however, I need some time to test and make sure that removing it doesn't mess up things.

I tried on my end and setting the flag to true for configs/arah-zju/ZJUMOCAP-377_4gpus.yaml, and it runs with no problem. This is a bit confusing now. Do you mean a clean run (without modifying the code) works, but an error occurs when you set the data.high_res flag in the configuration YAML file to true?

sunwonlikeyou commented 1 year ago

There was some bugs or mistake on preprocessing code. I made 'annots.npy' using only 4 cameras. But preprocess_ZJU-MoCap.py assumes there are all cameras. So I changed that part and It runs.

Thank you very much!!