关于前向传播逻辑的问题

qiuqiao / SOFA

SOFA: Singing-Oriented Forced Aligner

MIT License

118 stars 17 forks source link

关于前向传播逻辑的问题 #36

Closed ILG2021 closed 30 minutes ago

ILG2021 commented 2 hours ago

    def forward(self, *args: Any, **kwargs: Any) -> Any:
        h = self.backbone(*args, **kwargs)
        logits = self.head(h)
        ph_frame_logits = logits[:, :, 2:]
        ph_edge_logits = logits[:, :, 0]
        ctc_logits = torch.cat([logits[:, :, [1]], logits[:, :, 3:]], dim=-1)
        return ph_frame_logits, ph_edge_logits, ctc_logits

想问下作者，ph_edge特征用索引0，ph_frame特征用索引2:表示，这个好理解，想问下索引1代表的是什么特征？然后ctc取索引1，又略过2，可以给解释下吗？ctc是什么？

qiuqiao commented 2 hours ago

CTC全称Connectionist temporal classification，是一种常用在ASR、OCR、TTS等地方的算法，可以解决输入输出序列的对齐问题，更多信息可以在网络上搜寻；索引2是SP/AP音素的logit，3:是除了SP/AP以外的其他音素的logits；索引1是CTC的blank，blank可以用来分割连续的相同音素（但这里好像没发挥多大的作用？）。

qiuqiao commented 2 hours ago

略过索引2是因为，在弱标签数据里，SP/AP插入在音素序列的哪个位置是不知道的，我们只能对其他音素进行对齐

ILG2021 commented 19 minutes ago

略过索引2是因为，在弱标签数据里，SP/AP插入在音素序列的哪个位置是不知道的，我们只能对其他音素进行对齐好的，谢了