Closed davehorton closed 11 months ago
There is another more "quick and dirty" option we could consider for the near term. If we allowed the user to click and drag on the timeline to create a segment, they could click at the start of the silence period and drag and release when the transcript is returned. When they release we could calculate the length of time represented by the rectangle they just created and display that in a popup or some sort of annotation.
This is less desirable than actually processing the audio to calculate the latencies in the recording, but could be a short term fix if the better solution is not able to be accomplished
In the recording window when we check "Overlay STT and DTMF events" we get a picture that gives us a rough sense of the latency of the speech recognizer:
In that picture, the latency is the length of time between the end of speech energy in light red horizontal bar and the end of that horizontal bar -- the bar (span) ends when we get a transcript back from the recognizer. So we can kind of see that the latency was a bit less than a second, but it is not precisely measured. We must try to figure out how we can make a precise calculation of the latency and show it in the popup: i.e. in the case where there is a transcript in the popup we should include a "latency" field that shows the calculated latency of the speech recognition service in seconds (with fractions to the millisecond).
This will be challenging because we need to determine the start point to measure from - that point where the speech energy goes to zero. There is no event in the trace that will give us that time point, so I see two possible solutions: