Hi, thank you for your excellent work on gaze estimation. Recently, I have been reading your paper 'Dynamic 3D Gaze from Afar' as well as the code. There are certain parts of the GazeNet module that I couldn't understand the specific meaning.
reference_rad = head_outputs['direction'][:, self.n_frames // 2]
dst_rad = torch.zeros_like(reference_rad)
dst_rad[:, 2] = -1
R = get_rotation(reference_rad, dst_rad)
head_dir = torch.einsum('bij,bfj->bfi', R, head_outputs['direction'])
body_dir = torch.einsum('bij,bfj->bfi', R, body_outputs['direction'])
gaze_res['direction'] = torch.einsum(
'bij,bfj->bfi', R.transpose(1, 2), gaze_res['direction'])
Could you please explain it to me?
Hi, thank you for your excellent work on gaze estimation. Recently, I have been reading your paper 'Dynamic 3D Gaze from Afar' as well as the code. There are certain parts of the GazeNet module that I couldn't understand the specific meaning. reference_rad = head_outputs['direction'][:, self.n_frames // 2] dst_rad = torch.zeros_like(reference_rad) dst_rad[:, 2] = -1 R = get_rotation(reference_rad, dst_rad) head_dir = torch.einsum('bij,bfj->bfi', R, head_outputs['direction']) body_dir = torch.einsum('bij,bfj->bfi', R, body_outputs['direction']) gaze_res['direction'] = torch.einsum( 'bij,bfj->bfi', R.transpose(1, 2), gaze_res['direction']) Could you please explain it to me?