sooftware / conformer

[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)
Apache License 2.0
917 stars 173 forks source link

Conformer Transducer inference #20

Open jungwook518 opened 3 years ago

jungwook518 commented 3 years ago

Transducer inference할 때 audio encoder의 output을 time step마다 한개씩 넣는 이유가 있을까요? Real time을 대비해서 그렇게 inference를 하는 것 같은데 non-real time일때는 어떻게 inference가 되는건지 제가 이해하고 있는게 맞는건지 잘 모르겠습니다.

sooftware commented 3 years ago

어디쪽 코드를 말씀하시는 걸까요?

jungwook518 commented 3 years ago

@torch.no_grad() def decode(self, encoder_output: Tensor, max_length: int) -> Tensor: """ Decode encoder_outputs. Args: encoder_output (torch.FloatTensor): A output sequence of encoder. FloatTensor of size (seq_length, dimension) max_length (int): max decoding time step Returns: