Camera coordinates in provided data

CorneliusHsiao commented 2 years ago

Hi,

Great work! I am performing some test on my own dataset using your pre-trained model, but it outputs a total blank. I found your camera coordinate is different from mine. Can you explain per my confusion below? Much thanks.

I have your cameras.npz in validation_2 folder visualized here. Figure_1

Is your coordinate left-handed or right-handed? And does it follow OpenCV convention or OpenGL convention?
Is your look-at vector [0, 0, -1]?

I appreciate to your explanation.

CorneliusHsiao commented 2 years ago

I figured these out. The only issue remaining is: there is a hole left at the same center of face. Is there any configuration regarding this?

ken2576 commented 2 years ago

Hi Sorry for the late reply. IIRC, we are using OpenCV convention and it's right-handed. look-at vector should be [0, 0, 1] to the front of the camera.

For the case you showed, it seems like the near plane is not decided correctly. We use the average baseline between human eyes to estimate the depth and thus decide the near/far plane.

If you are using COLMAP, then you can use the following code to convert it into a canonical coordinate system.

convert_canonical_realdata.py https://gist.github.com/ken2576/3d2e5983660915d807276e818a529793

read_write_depth.py https://gist.github.com/ken2576/29bc6aad4f0222c939f25c217830a3cc

colmap_read_model.py https://gist.github.com/ken2576/2b70dc12275fbfd650054cb7bb5e0779

Please let me know if you have other questions. You might need to modify part of the code if your format is different.

CorneliusHsiao commented 2 years ago

Thanks. I have managed to change the depth of ray sampling here to enlarge the range, but there is no difference from the last testing. Can I assume that the volume did not cover this large when you trained your model? Therefore, the only way to solve this is to re-train with my new dataset?

ken2576 commented 2 years ago

I am not sure what the baseline of your input is. It is possible that self-occlusion is too severe such that it becomes difficult for the network to reconstruct the face. The input should also contain one frontal face image with 4 other images having good enough overlaps.

prraoo commented 2 years ago

I figured these out. The only issue remaining is: there is a hole left at the same center of the face. Is there any configuration regarding this?

@CorneliusHsiao could you let me know what were changes you did to make this work? Because I am in a similar situation as you, i.e., getting a blank image during inference. I am using the pre-trained model and with my own dataset.

CorneliusHsiao commented 2 years ago

It seems I still have some issue in coord alignment which leads to the hole at the center. The issue I had at the very beginning was input camera coord system. After conversion from c2w to w2c, I found their cams seem to look at +Y, so I rotate all cams. Their COLMAP conversion script worked to get me a result looking bottom-up.

I figured these out. The only issue remaining is: there is a hole left at the same center of the face. Is there any configuration regarding this?

@CorneliusHsiao could you let me know what were changes you did to make this work? Because I am in a similar situation as you, i.e., getting a blank image during inference. I am using the pre-trained model and with my own dataset.

prraoo commented 2 years ago

Their COLMAP conversion script worked to get me a result looking bottom-up.

Thanks for this heads up. I had used the COLMAP script as well and somehow it does seem to be doing right. And I get this as the result.

Just curious as to how does your projection plot looks like when you run the COLMAP? I starting to suspect that it is just some scale issue from my side. This is how it looks on with scale =100

CorneliusHsiao commented 2 years ago

Mine looks like this.

Make sure frontal_id in their COLMAP conversion script is set correctly.

image_2022_01_20T01_06_05_093Z

kevinkingo commented 2 years ago

Hi, this is Tiancheng Sun, another co-first author of the paper. Here is the coordinate system and camera format we are using: World coordinate: The human face is at the origin, facing -y. Its left-hand side is +x, and its upward direction is +z. We usually put the camera near the -y axis (see test/new_camera.py), so that the camera is looking at the front face (i.e., the camera position should be near (0, -Y, 0). When doing testing, make sure your cameras are also placed near -y axis. I believe this is the place where you messed up. Camera coordinate: camera right-hand side is +x, downward is +y, and the camera is looking at +z.

Hope this can help.

prraoo commented 2 years ago

Hello @kevinkingo @ken2576

Thank you for your help. I got it working now. I believe my issues were from the line below:

depth = np.sqrt(np.sum(extrinsics[0, :, :3].T.dot(extrinsics[0, :, 3]) ** 2))

My reference image (frontal_id) used in COLMAP was not the very first image while the depth calculations were based on the first image. I think this was the point @CorneliusHsiao warned me about.

ken2576 commented 2 years ago

Closing the issue.

zshyang commented 2 years ago

Hi @ken2576 @kevinkingo @CorneliusHsiao @prraoo, I got some undesired results and it looks like the following: I have written all the steps and used images in the following documentation. Could you look into it? Because of the privacy issue, I put a restriction on the file. Thus, you will need to request to view it. Hope you could understand. https://docs.google.com/document/d/1eN3cILAmPLMEsOEYEeba5rvL62_e862rLAl_cbPZSDY/edit?usp=sharing The overall steps I have used are:

Take photos
Crop the photos and generate the masks. Also, remove the background.
Use colmap to reconstruct the scene
Use the script you’ve provided to generate the ‘camera.npz’ file.
Run the test code. Could you point me to where could the mistake happen? Thanks a lot!

prraoo commented 2 years ago

@zshyang looks like the face alignment is not correct.

Have you verified this:

Hi, this is Tiancheng Sun, another co-first author of the paper. Here is the coordinate system and camera format we are using: World coordinate: The human face is at the origin, facing -y. Its left-hand side is +x, and its upward direction is +z. We usually put the camera near the -y axis (see test/new_camera.py), so that the camera is looking at the front face (i.e., the camera position should be near (0, -Y, 0). When doing testing, make sure your cameras are also placed near -y axis. I believe this is the place where you messed up. Camera coordinate: camera right-hand side is +x, downward is +y, and the camera is looking at +z.

Hope this can help.

ken2576 commented 2 years ago

Haven't looked at the data yet but you should do COLMAP first before step 2. Because after you remove the background, it will be a bit harder to get the correct camera pose.

zshyang commented 2 years ago

@prraoo @ken2576 thanks a lot for your reply. Yes, the face alignment is the biggest problem I think! I got the algorithm to work now. If you are willing to, I have updated the documentation I have shared with you to show the results. The result is not as shiny as your shown in your paper. More specifically, the generated image has some blurred black regions on it. I think the problem is because of the lighting condition and the size of the training data. But I am still very glad that your work works. I think it would be a very good starting point for my research. Also, your judgment about why the black regions happen is very valuable to me as well.

Speaking of the second suggestion,

Haven't looked at the data yet but you should do COLMAP first before step 2.
Because after you remove the background, it will be a bit harder to get the correct camera pose.

, my option is that this might be overshooting. Because after cropping and resizing the image, the camera matrix (intrinsic and extrinsic) computed from the full image should be changed accordingly. This is possible but the computation should be very careful.

CorneliusHsiao commented 2 years ago

@zshyang I encountered black holes on faces as well. Another issue in my experiment was accuracy in camera estimation by COLMAP. If you have ground-truth camera matrices, I'd suggest converting and using them instead.

ken2576 / nelf

Camera coordinates in provided data #3