pixelite1201 / BEDLAM

206 stars 19 forks source link

camera and body in world coordinate system #44

Closed davidpagnon closed 2 months ago

davidpagnon commented 2 months ago

I can't reopen an issue and I'm not sure how the notification system works on closed ones, so I'm posting this as a new one. Please pardon me if this has already been seen. Here was my question:

I'm still kind of fuzzy about why and how the camera extrinsic parameters change from person to person on the same image. Why do we need to change both the position of the camera (npz_params['cam_ext']) and of the body (npz_params['trans_world'])?

In any case, since this is what I ultimately need, is there a way to get the camera calibration in the world coordinate system, and idem for each SMPL-X mesh?

Originally posted by @davidpagnon in https://github.com/pixelite1201/BEDLAM/issues/43#issuecomment-2035584229

davidpagnon commented 2 months ago

Hi @pixelite1201 and co,

Now I'm thinking of giving up on using the npz data for the camera and body positions and locations, and rather use the be_seq.csv file. Once I'm sure the positions and orientations are good, I'll tackle pose, shape, and scale, but I suppose I will just have to retrieve the information from the npz file. Is there a readme file or something that explains the formalism be_seq.csv uses?

If I understand well, Group corresponds to the camera extrinsic parameters, and Body corresponds to the mesh ones. Both Group and Body, X, Y, Z are in centimeters, and Yaw, Pitch, Roll in degrees. Bodies only have X,Y, and Yaw parameters.

Does it sound accurate? If so, I still have a problem because the camera is not looking at the person at all. Do the location axes need to be rotated? In which order do the angles need to be taken? Are they centered around the world origin or around the camera/body?

davidpagnon commented 2 months ago

Hi again,

I made some progress but I am still stuck with sub-optimal results: the body meshes do not perfectly overlay the image. Do you see anything wrong with my method? I need it to be exact to the pixel.

Could the issue be due to:

Thank you in advance!

Method used

Body mesh import

  1. Using the body meshes specified in be_seq.csv
  2. Importing them using the Blender SMPL-X add-on (format SMPL-X, rest position SMPL-X).
  3. Placing them in the scene by retrieving from be_seq.csv: location X, location Y, and Yaw.
    • X and Y in cm, Yaw in degrees.
    • Start frame shifted according to Comment (same csv file).
    • Location Y = - Location Y, Rotation Yaw = - Rotation Yaw.

Camera import

  1. Creating a Blender camera.
  2. Placing it in the scene by retrieving from seq_000000_camera.csv: X, Y, Z, Roll, Pitch, Yaw (in that order), hfov.
    • X and Y in cm, Yaw in degrees.
    • Focal length set to hfov (Field of view), which is equivalent to focal_length (mm).
    • Location Y = - Location Y, Rotation Roll = 90°, Rotation Pitch = - Rotation Pitch, and Rotation Yaw = -90 - Rotation Yaw.

image

The test case I am working on is 20221010_3_1000_batch01hand, seq_000000. be_seq.csv gives:

Index,Type,Body,X,Y,Z,Yaw,Pitch,Roll,Comment
0,Comment,None,0,0,0,0,0,0,bodies_min=3;bodies_max=3;x_offset=650;y_offset=0.0;z_offset=0.0;x_min=-50;x_max=50;y_min=-250;y_max=250;yaw_min=0;yaw_max=360;cam_x_offset=10.0;cam_y_offset=10.0;cam_z_offset=5.0;cam_yaw_min=-3;cam_yaw_max=3;cam_pitch_min=-10;cam_pitch_max=3;cam_roll_min=-3;cam_roll_max=3;cam_config=cam_random_e
1,Group,None,5.058579444123886,-9.743741006226163,168.9532350186906,1.468126863951408,-2.9050680463167105,2.8139950143546857,sequence_name=seq_000000;frames=128;hdri=abandoned_church;camera_hfov=52.0
2,Body,rp_claudia_posed_005_1001,618.9016947055262,102.08905542916352,0.0,161.95890288786055,0.0,0.0,start_frame=1;texture_body=skin_f_white_04_ALB;texture_clothing=texture_09
3,Body,rp_beatrice_posed_025_1076,696.7017401790371,-229.90220980456667,0.0,314.3671548169336,0.0,0.0,start_frame=69;texture_body=skin_f_asian_09_ALB;texture_clothing=texture_16
4,Body,rp_cindy_posed_005_1097,684.9980306666062,-54.09495558334399,0.0,103.42616917612702,0.0,0.0,start_frame=65;texture_body=skin_f_indian_10_ALB;texture_clothing=texture_06

seq_000000_camera.csv gives:

name,x,y,z,yaw,pitch,roll,focal_length,sensor_width,sensor_height,hfov
seq_000000_0000.png,5.058579,-9.743741,168.953232,1.468127,-2.905068,2.813995,36.905,36,20.25,52
tpsmpi commented 2 months ago

Always use the dedicated camera ground truth files (seq_XXXXXX_camera.csv) which we provide for retrieving camera extrinsics/intrinsics. There are many sequences in the dataset where the camera is moving. The be_seq.csv file does not provide camera ground truth for all types of shots in BEDLAM.

Camera ground truth .csv files use Unreal coordinate system to describe camera world space location and rotation.

tpsmpi commented 2 months ago

See also: https://github.com/PerceivingSystems/bedlam_render/blob/main/unreal/render/unreal_coordinate_system.md

davidpagnon commented 2 months ago

Wow Unreal coordinate system is quite unusual! The weird transformations I was doing make a bit more sense now, thanks. I'll experiment it next week and see if it fixes it all! Best regards,

tpsmpi commented 2 months ago

Make sure that you also enable SMPL-X pose correctives when using the Blender add-on to import animations. See Notes section at https://github.com/PerceivingSystems/bedlam_render/tree/main/blender/smplx_anim_to_alembic

davidpagnon commented 2 months ago

Thank you @tpsmpi !

So I implemented the transformations you specified, and it seems like it is exactly equivalent to what I did above -- but at least, now it makes sense. However, there is still this little offset.

I'm wondering if this might be an issue with the positioning of the body mesh. I enabled pose correctives, but it won't change location and rotation. There is this line I don't understand in the be_seq.csv file:

0,Comment,None,0,0,0,0,0,0,bodies_min=3;bodies_max=3;x_offset=650;y_offset=0.0;z_offset=0.0;x_min=-50;x_max=50;y_min=-250;y_max=250;yaw_min=0;yaw_max=360;cam_x_offset=10.0;cam_y_offset=10.0;cam_z_offset=5.0;cam_yaw_min=-3;cam_yaw_max=3;cam_pitch_min=-10;cam_pitch_max=3;cam_roll_min=-3;cam_roll_max=3;cam_config=cam_random_e

What does x_offset=650;y_offset=0.0;z_offset=0.0 mean? And cam_x_offset=10.0;cam_y_offset=10.0;cam_z_offset=5.0? cam_config=cam_random_e?

On other sequences, I see on the camera lines of be_seq.csv that I also don't understand: cameraroot_x=3350.0;cameraroot_y=1050.0;cameraroot_z=70.0? cameraroot_yaw=131.48002763593706?

tpsmpi commented 2 months ago

Line index 0 ist just FYI comment about the randomization parameters. The cameraroot parameters are used to setup the camera rig in Unreal for rendering. See also: https://github.com/PerceivingSystems/bedlam_render/issues/10 But always use the dedicated camera ground truth files for anything camera related. They contain all you need.

davidpagnon commented 2 months ago

Nevermind, I figured it out! As I had to rotate my Blender camera so that it would face the +X direction, I faced some annoying brain teaser when trying to rotate around the new local axes, but it's all good now!

Thank you for taking the time to write this very clear file and for your help on this thread, I have been stuck on it for quite a while.