How do you obtain your View matrix in your world 3D co-ordinate to 2D screen projection?

oscarmcnulty / gta-3d-dataset

A dataset of 2D imagery, 3D point cloud data, and 3D vehicle bounding box labels all generated using the Grand Theft Auto 5 game engine.

134 stars 15 forks source link

How do you obtain your View matrix in your world 3D co-ordinate to 2D screen projection? #3

Closed vsaben closed 4 years ago

vsaben commented 4 years ago

Your JSON annotation files show that you extract View (V) and Projection (P) matrices. I tried to generate these same matrices using the GTAVisionExport mod but these do not appear to be accurate. I was able to recollect your P, but how did you formulate V? Thank you in advance. Any help would be much appreciated.

oscarmcnulty commented 4 years ago

The view matrix V defines how world space is related to camera space (basically where the camera is positioned relative to the world). This will be different for every image (since camera is positioned differently in each image). It makes sense that you are able to recreate the P matrix since the camera field of view etc. is the same.

If you look at https://github.com/oscarmcnulty/gta-3d-dataset/blob/master/gta.py#L202 you can see how to

Move coordinates from model space to world space using the model position and rotation (Line 208)
Move coordinates from world space to camera space using the V matrix (Line 211)
Move coordinates from camera space to clip space using the P matrix (Line 211)

Basically np.dot(self.V, corner_world_) will be apply a translation and rotation so (0,0,0) is where the camera is located.

vsaben commented 4 years ago

Thank you for taking the time to respond to my query. I understand how you are moving between co-ordinate systems. I am unsure though if I am calculating V correctly. When you generated your data set, what order did you apply the (4 x 4) matrix rotations and translation to create V? I attempted the procedure below based on the camera's rotation ($\alpha$ = pitch, $\beta$ = roll, $\gamma$ = yaw) and position (x, y, z) but to no avail.

View matrix

oscarmcnulty commented 4 years ago

The matrices in this dataset are obtained by dumping the directx constant buffers rather than constructing from euler angles. There is an example of how this can be done specifically for GTA5 here.

The camera rotation and position in the json are from scripthook rather than from decomposing the V matrix.

I had a look at the code I used and there is a bug where camera position from scripthook is written to the cameraRot instead of the rotation (eg cameraRot = cameraPos which are both the camera position). I only used the V matrix in my experiments so I didn't catch this but you should be able to work out the correct euler angles by decomposing the V matrix if you really need them.

vsaben commented 4 years ago

I initially tried extracting V, P using GTAVisionExport's GetConstantBuffer() functionality (i.e. the link you provided). Given large inconsistencies in 3D to 2D co-ordinate projection, I attempted to recreate these matrices manually. If you were able to get meaningful output from this constant buffer, this may point to another error in my data extraction pipeline or inconsistencies in GTA Native functionality.

Thank you for your help. Well done on your thesis by the way.