I tried to reproduce your work, and I found that following your script exactly could not get the results in your paper, and there was a huge difference in low latency. One of the things I found was that when the lagging_segment was 1, diseg_agent couldn't read all the audio properly. I found on one audio that when the total audio length is 14s, the agent will stop reading the audio when it reaches around 1.3s because his states.finish_hypo() becomes True
I tried to reproduce your work, and I found that following your script exactly could not get the results in your paper, and there was a huge difference in low latency. One of the things I found was that when the lagging_segment was 1, diseg_agent couldn't read all the audio properly. I found on one audio that when the total audio length is 14s, the agent will stop reading the audio when it reaches around 1.3s because his states.finish_hypo() becomes True