zhengyuf / STED-gaze

Code for paper 'Self-learning Transformations for Improving Gaze and Head Redirection'
GNU General Public License v3.0
89 stars 23 forks source link

Question about semi-supervised cross dataset evaluation #15

Closed lunaryle closed 10 months ago

lunaryle commented 10 months ago

Hi @zhengyuf, thank you for your great work, STED. I have a question about the training phase for Semi-supervised evaluation.

From 4.5, the paper says "we estimate the joint probability distribution function of the gaze and head orientation values of the labeled subset and sample random target conditions from it".

  1. Could you give more explanation about 'the joint probability distribution function' you applied for sampling, and why should you have considered this probability? Is this method for preventing unrealistic cases such as images where head pose vector and gaze vector pointing completely opposite directions?

  2. What is the purpose for using the rest of GazeCapture dataset without labels? Are you assuming situation when ground truth is hard to obtain? In this case, is loss for image generation is still applied to the model, but explicit gaze and head labels are not applied except for the sampled images? If so, why could you say that 'the model could augment new samples with just small amounts of training data', since you are fully using GazeCapture images?

I would appreciate your feedback.

zhengyuf commented 10 months ago

Hi,

  1. the joint distrubution estimation and sampling is implemented here: https://github.com/zhengyuf/STED-gaze/blob/add7936af86c790821ee6fb39caa252fe26c8340/dataset.py#L85 And, yes, this step is to avoid unrealistic gaze direction and head pose combinations.

  2. First, usually, labeled data is considered expensive and unlabeled data can almost be seen as free. We want to test whether we can use unlabeled data to boost accuracy. Ideally, we can also use the entire GazeCapture dataset plus facial image datasets without gaze labels. But the preprocessing would be cumbersome to do. Therefore, we used a subset of GazeCapture as the unlabeled dataset to prove the concept.

In this case, is loss for image generation is still applied to the model, but explicit gaze and head labels are not applied except for the sampled images?

Yes this is correct.

lunaryle commented 10 months ago

Thank you for your quick and detailed response! :)