joonson / syncnet_trainer

Disentangled Speech Embeddings using Cross-Modal Self-Supervision
MIT License
155 stars 26 forks source link

label #3

Open jane-pyc opened 4 years ago

jane-pyc commented 4 years ago

why you define your label in the loss function to be :

` def sync_loss(self,out_v,out_a,criterion):

    batch_size  = out_a.size()[0]
    time_size   = out_a.size()[2]

    label       = torch.arange(time_size).cuda()

    nloss = 0
    prec1 = 0

`

should the label be 0 or 1 depended on the data is synchronized or not ?

joonson commented 4 years ago

The label should be the matching frame, i.e. along the diagonal. See https://ieeexplore.ieee.org/document/9067055