Annotation of Head-, Body-Orientation, and Gaze direction

Matus-Tanonwong commented 2 years ago

Hello, there! Thank you very much for your great work. Now, I am collecting my own custom dataset using Pupil core eye trackers and GoPro cameras as mentioned in the paper. I have two questions about the GAFA dataset. 1.) Could you please provide me more details about how you obtained the ground-truth head and body orientations by using an AR marker-based positioning system from videos of the eye-tracking glasses's world camera and the GoPro camera, and the gaze direction relative to the head pose obtained by Pupil Core?

In the paper, it mentions solving PnP problems. Could you please give me some guidance?

2.) Are the ground truth of head, body, and gaze directions provided in the camera coordinate system of the surveillance camera or the camera coordinate system of the Pupil core's world camera and GoPro camera?

Thank you very much in advance for your consideration

SomaNonaka commented 2 years ago

Hi,

1.) The detailed procedure is as follows

Before experiment: i) Choose one AR marker and use it to define the world coordinate ii) Scan all AR markers in the room, starting with the reference AR marker. Repeat the process of obtaining the camera pose in a frame using the AR marker whose 3D position is already known, and then compute the 3D position of the new AR marker, until the 3D positions of all the AR markers are computed.

Note that you need to translate the 3D position of new AR makers to the world coordinate in each step iii) Place surveillance cameras and compute their camera extrinsics based on the 3D position of the AR marker reflected in the surveillance cameras.

During experiment: i) Compute the camera pose of head- and body-mounted cameras based on the 3D position of the AR marker reflected in the cameras. You can do this by detecting AR markers with cv2.aruco.detectmarkers and computing the camera pose with cv2.aruco.estimatePoseSingleMarkers. ii) Pupil Labs' software outputs gaze directions relative to the head pose. Convert the gaze direction to the world coordinate by multiplying head rotation matrix to the gaze direction.

2.) Ground truth of head, body, and gaze directions are provided in the camera coordinate system of each surveillance camera.

Matus-Tanonwong commented 2 years ago

Thank you very much for your quick response and clear explanation! I will follow your guidance and contact you again if I confront any issues.

Thank you very much!!

Matus-Tanonwong commented 2 years ago

Hello,

I have two more questions regarding your guidance. The first question is about the 3D position of new AR markers. After I translated the 3D position of new AR makers to the world coordinate in each step, the errors in the 3D positions of new AR markers accumulated ,and the errors became larger and larger. As you guided me "Note that you need to translate the 3D position of new AR makers to the world coordinate in each step", I only need to do the translation, but no need for the rotation. Is it correct?

The second question is about the usage of Pupil core eye trackers. Did you use Pupil Mobile App for collecting the videos from the eye tracker remotely?

Thank you very much in advance for your consideration

SomaNonaka commented 2 years ago

Hi,

After I translated the 3D position of new AR makers to the world coordinate in each step, the errors in the 3D positions of new AR markers accumulated ,and the errors became larger and larger.

Yes, you are correct. We placed a big ChARuCO board in the center of the room, and use it as a reference marker. We computed 3D positions of each AR marker from camera pose directly obtained using the ChARuCO board. This method however can only be used if there is no obstructions in the room. If you are doing experiment in such environments, you need to compute 3D positions of AR makers in the way I explained earlier. To reduce the estimation errors, it is recommended to use a wide-angle camera to capture several markers at once. Also, you should apply bundle adjustment.

As you guided me "Note that you need to translate the 3D position of new AR makers to the world coordinate in each step", I only need to do the translation, but no need for the rotation. Is it correct?

No, you also need to do rotation.

Did you use Pupil Mobile App for collecting the videos from the eye tracker remotely?

No, we didn't. We used GPD pocket, which is a handheld Windows PC.

Matus-Tanonwong commented 2 years ago

Thank you very much for your kind response and detailed explaination! As of now, I have managed to locate all AR markers in the world coordinate system with acceptable errors. However, I am confronting another problem. When I compute the camera poses based on the aruco markers reflected in the cameras, some aruco markers do not provide accurate positions and orientations of the cameras in the world coordinate.

Thank you very much in advance for consideration

SomaNonaka commented 2 years ago

Hi,

We calculated the re-projection error for each marker and remove markers with large errors. We also used RANSAC for removing outliers. We interpolate camera's position when the camera position does not change much between the previous and next frames.

Please note that this repository is intended to explain our methods in more detail, and not for dealing with individual cases. Please don't send issues like "Are there any methods?" or "is it wise to...?". It depends on your experimental settings.

If you have any questions about our experimental methods, I will answer them, so feel free to re-open this issue.

Matus-Tanonwong commented 2 years ago

Thank you very much for your response.

I apologize for my questions in the previous post. If I have any further questions about your experimental methods, I will re-open this issue.

Again, thank you very much for your detailed explanation.

kyotovision-public / dynamic-3d-gaze-from-afar

Annotation of Head-, Body-Orientation, and Gaze direction #6