yeyupiaoling / VoiceprintRecognition-PaddlePaddle

本项目使用了EcapaTdnn、ResNetSE、ERes2Net、CAM++等多种先进的声纹识别模型,同时本项目也支持了MelSpectrogram、Spectrogram、MFCC、Fbank等多种数据预处理方法
Apache License 2.0
229 stars 44 forks source link

AAMLoss的问题 #13

Closed gooloosk closed 6 months ago

gooloosk commented 6 months ago

在计算AAMloss的时候报错维度不匹配

ValueError: (InvalidArgument) Input(Logits) and Input(Label) should in same shape in dimensions except axis.

我打印出来看了一下 https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle/blob/develop/ppvector/models/loss.py 中的AAMloss中 loss = self.criterion(output, label)

output 355是我数据集的分类个数

Tensor(shape=[64, 64, 355], dtype=float32, place=Place(gpu:0), stop_gradient=False, [[[-0.95590430, 0.50596744, 2.29433346, ..., -2.05126929, -5.45670271, 1.53850305], [ 2.50000358, -3.78276157, 0.98925447, ..., 2.54729700, 1.36735356, 0.96287811], [-0.00937408, 0.40484223, -2.18388867, ..., -0.29867306, 1.13311613, 0.76943004], ..., [ 1.94269681, 0.17316771, -1.31787276, ..., 2.20876241, -1.48647094, 2.89808607], [ 1.91438305, 2.85331845, -2.19972754, ..., 3.49275684, 3.03001499, -0.18890989], [ 2.02145815, -1.01368928, 0.66119802, ..., 4.42322826, 0.08005953, -4.07938671]],

    [[-0.95590430,  0.50596744,  2.29433346, ..., -2.05126929,
      -5.45670271,  1.53850305],
     [ 2.50000358, -3.78276157,  0.98925447, ...,  2.54729700,
       1.36735356,  0.96287811],
     [-0.00937408,  0.40484223, -2.18388867, ..., -0.29867306,
       1.13311613,  0.76943004],
     ...,
     [ 1.94269681,  0.17316771, -1.31787276, ...,  2.20876241,
      -1.48647094,  2.89808607],
     [ 1.91438305,  2.85331845, -2.19972754, ...,  3.49275684,
       3.03001499, -0.18890989],
     [ 2.02145815, -1.01368928,  0.66119802, ...,  4.42322826,
       0.08005953, -4.07938671]],

    [[-0.95590430,  0.50596744,  2.29433346, ..., -2.05126929,
      -5.45670271,  1.53850305],
     [ 2.50000358, -3.78276157,  0.98925447, ...,  2.54729700,
       1.36735356,  0.96287811],
     [-0.00937408,  0.40484223, -2.18388867, ..., -0.29867306,
       1.13311613,  0.76943004],
     ...,
     [ 1.94269681,  0.17316771, -1.31787276, ...,  2.20876241,
      -1.48647094,  2.89808607],
     [ 1.91438305,  2.85331845, -2.19972754, ...,  3.49275684,
       3.03001499, -0.18890989],
     [ 2.02145815, -1.01368928,  0.66119802, ...,  4.42322826,
       0.08005953, -4.07938671]],

    ...,

    [[-0.95590430,  0.50596744,  2.29433346, ..., -2.05126929,
      -5.45670271,  1.53850305],
     [ 2.50000358, -3.78276157,  0.98925447, ...,  2.54729700,
       1.36735356,  0.96287811],
     [-0.00937408,  0.40484223, -2.18388867, ..., -0.29867306,
       1.13311613,  0.76943004],
     ...,
     [ 1.94269681,  0.17316771, -1.31787276, ...,  2.20876241,
      -1.48647094,  2.89808607],
     [ 1.91438305,  2.85331845, -2.19972754, ...,  3.49275684,
       3.03001499, -0.18890989],
     [ 2.02145815, -1.01368928,  0.66119802, ...,  4.42322826,
       0.08005953, -4.07938671]],

    [[-0.95590430,  0.50596744,  2.29433346, ..., -2.05126929,
      -5.45670271,  1.53850305],
     [ 2.50000358, -3.78276157,  0.98925447, ...,  2.54729700,
       1.36735356,  0.96287811],
     [-0.00937408,  0.40484223, -2.18388867, ..., -0.29867306,
       1.13311613,  0.76943004],
     ...,
     [ 1.94269681,  0.17316771, -1.31787276, ...,  2.20876241,
      -1.48647094,  2.89808607],
     [ 1.91438305,  2.85331845, -2.19972754, ...,  3.49275684,
       3.03001499, -0.18890989],
     [ 2.02145815, -1.01368928,  0.66119802, ...,  4.42322826,
       0.08005953, -4.07938671]],

    [[-0.95590430,  0.50596744,  2.29433346, ..., -2.05126929,
      -5.45670271,  1.53850305],
     [ 2.50000358, -3.78276157,  0.98925447, ...,  2.54729700,
       1.36735356,  0.96287811],
     [-0.00937408,  0.40484223, -2.18388867, ..., -0.29867306,
       1.13311613,  0.76943004],
     ...,
     [ 1.94269681,  0.17316771, -1.31787276, ...,  2.20876241,
      -1.48647094,  2.89808607],
     [ 1.91438305,  2.85331845, -2.19972754, ...,  3.49275684,
       3.03001499, -0.18890989],
     [ 2.02145815, -1.01368928,  0.66119802, ...,  4.42322826,
       0.08005953, -4.07938671]]])

label

Tensor(shape=[64, 1], dtype=int64, place=Place(gpu:0), stop_gradient=True, [[16 ], [108], [342], [5 ], [149], [6 ], [198], [246], [214], [318], [301], [349], [145], [14 ]

观察发现output里每一个二维特征图都是相同的

这个要如何修正?是否可以直接取一个二维特征图进行loss计算?

gooloosk commented 6 months ago

发现是onehot形状的问题 https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle/blob/develop/ppvector/models/loss.py 输出one_hot形状为[64,1,355]

    one_hot = F.one_hot(label, cosine.shape[1])
    print("one_hot.shape",one_hot.shape)
    one_hot = paddle.squeeze(one_hot, axis=1)  #增加一行这个 把形状变成[64,355]

可以了