swook / EVE

Towards End-to-end Video-based Eye-tracking. ECCV 2020. https://ait.ethz.ch/eve
MIT License
115 stars 22 forks source link

question about camera_transformation #2

Open RichardoMrMu opened 3 years ago

RichardoMrMu commented 3 years ago

Hi, EVE is an excellent work, and I benefit a lot from it. But I have a question about HDF file data filed --camera_transformation. I will appreciate it very much if you could show me the way to get this field.

swook commented 3 years ago

Hi @RichardoMrMu , I'm sorry that I completely missed this issue.

I am not entirely sure which part you need help in, so I'd like to point you to: https://github.com/swook/EVE/blob/master/DATASET.md#hdf-file-format for now.

I hope this helps.

RichardoMrMu commented 3 years ago

Thanks. What I mean is that how I can calculate the camera_transformation by myself. I am troubled with it. Does camera_transformation get by rotation and perspective transform matrix? I will appreciate it if you would tell me.

xucong-zhang commented 3 years ago

Hi @RichardoMrMu , you need to do the camera-screen calibration to get the values in camera_transoformation. We wrote such information in the "A.3 Dataset Pre-processing" of the "Appendix" section in the paper. This calibration can be done with the "mirror-based camera pose estimation". You can find the code with the link.

RichardoMrMu commented 3 years ago

Thanks for the help.

RichardoMrMu commented 3 years ago

I have another questiuon. What is the diffirence between the camera_transformation with "mirror-based camera poseestimation" and the transformation getting from ETH-xgaze? In ETH-Xgaze, it seems like every image can get a matrix, and like in a video, every frame have a diffirent matrix. But in EVE, a piece of video have a same matrix.

swook commented 3 years ago

Hi, the transformation that you link to in ETH-XGaze is a transformation for warping the input image space to the final face patch space. Hence, this is different for every image sample.

In EVE, the camera_transformation is composed of the rotation matrix and translation matrix that brings points from the screen coordinate system to the given camera's (one of basler, webcam_l, webcam_c, and webcam_r) coordinate system.

We also provide an equivalent matrix to the one you're referring to (from ETH-XGaze) as described in https://github.com/swook/EVE/blob/master/DATASET.md#hdf-file-format as:

{face,left,right}_W - (N, 3, 3) - the perspective transform matrix (line 74 of example code)