Questions about implementation

YuXiangLin1234 commented 1 year ago

Hello, I try to reproduce your great work based on the github repo you mentioned before, but results seem to be not good. I'm new to object detection, so there may be some errors in my code. I want to clarify two things:

In SSD, we have to encode the variance from priorbox when computing losses of face location & landmark and then decode it. I'm not sure if there any way to encode gaze yaw and pitch (They are neither in point-form or center-form), or we don't have to encode gaze in the way of encoding landmark and location at all?

For downstream 3D gaze head, can I just copy the architecture of location head & landmark head?


class LandmarkHead(nn.Module):
def __init__(self,inchannels=512,num_anchors=3):
    super(LandmarkHead,self).__init__()
    self.conv1x1 = nn.Conv2d(inchannels,num_anchors*10,kernel_size=(1,1),stride=1,padding=0) -------------------------> output 5 landmark
def forward(self,x):
    out = self.conv1x1(x)
    out = out.permute(0,2,3,1).contiguous()

    return out.view(out.shape[0], -1, 10)

class GazeHead(nn.Module): def init(self,inchannels=512,num_anchors=3): super(GazeHead,self).init() self.conv1x1 = nn.Conv2d(inchannels,num_anchors*2,kernel_size=(1,1),stride=1,padding=0) --------------------------> output pitch and yaw

def forward(self,x):
    out = self.conv1x1(x)
    out = out.permute(0,2,3,1).contiguous()

    return out.view(out.shape[0], -1, 2)



Thank you for any help you can provide.

mf-zhang commented 1 year ago

Hi, I built my code based on https://github.com/biubug6/Pytorch_Retinaface

I guess box location and landmark need to be encoded and decoded because of the various sizes of faces. I think gaze requires no such process since it's a normalized vector.
Yes, you can.

Hope my answer helps. Thank you!

YuXiangLin1234 commented 12 months ago

Thanks for your response, I set weights of different tasks to be 1 and get better results ( I set the weight of loss about gaze to be 10 at the beginning and get bad results).

mf-zhang / GazeOnce

Questions about implementation #6