In recording popup window calculate speech recognizer latency

In the recording window when we check "Overlay STT and DTMF events" we get a picture that gives us a rough sense of the latency of the speech recognizer:

In that picture, the latency is the length of time between the end of speech energy in light red horizontal bar and the end of that horizontal bar -- the bar (span) ends when we get a transcript back from the recognizer. So we can kind of see that the latency was a bit less than a second, but it is not precisely measured. We must try to figure out how we can make a precise calculation of the latency and show it in the popup: i.e. in the case where there is a transcript in the popup we should include a "latency" field that shows the calculated latency of the speech recognition service in seconds (with fractions to the millisecond).

This will be challenging because we need to determine the start point to measure from - that point where the speech energy goes to zero. There is no event in the trace that will give us that time point, so I see two possible solutions:

We post-process the audio stream in the webapp to calculate speech energy and determine the all of the time points where silence is detected following speech. We then find the last such time point within the light red bar / span and use that as the start time in calculating the latency, calculate the latency and display it in the popup.
If we can't do the above (which would be preferable) we could allow the user to click on the timeline within the transcribe span and use that time point as the start of the calculation. Option 1 is certainly preferred.

jambonz / jambonz-webapp

In recording popup window calculate speech recognizer latency #327