Transcription, doesn't take the last

DevRGT commented 3 months ago

Hello,

Thanky a lot for the Google Meet extension :-)

When we have a meeting, the transcription is in live correcting the text, but it seems the caption of the transcription is not the last version of what was said.

I have for exemple: "You (06/17/2024, 09:05 AM) so just to clarify at the beginning. I so w. ust to organ. ze. a ith. job NAME. and and the defense. they called. e f e Opera. . but t it was. a requ. st. from there. now. to AV. id also. you as Developers. but I think it would be a next. step. in any. case. r. will. you. will not. ot. start from the beginning. to to to develop . I'm not sure. that. it's needed. after that. joining. the e cir. nd SL. t. I don't. know. Let's see. the for sure the priority es the Sprint. that you have. on. oing. But yes. it's good that you join you. where? the stakeholder on? maybe? that? you can ask? a few. questions? walk will be collect. ng. requirements. "

Here the text is not understandable, but, when I was looking the live captions, the text was understandable.

As written in another issue, I use for Teams meeting another Extension (https://github.com/Zerg00s/Live-Captions-Saver ) and I get the last version of each caption. So maybe they implement something specific to get this last version.

vivek-nexus commented 3 months ago

Thanks @DevRGT for using the extension and raising an issue here.

I did notice this problem where the saved transcript does not include the corrected/latest version of the text that is seen on Google Meet.

What I have done already?

I've added a 1 second delay to allow for the corrections that Google Meet makes, before picking up the transcript. However, the mileage varies and as you have pointed out, it may not be fool proof.

Why this is not an issue with Teams?

When you observe transcripts generated by Teams, you can see that there is almost no correction happening. There is only one version and the other extension just captures that. Teams uses cloud speech to text engine (you can see a delay between your speech and transcript appearing) and hence there is no correction. However, as far as I have seen, Google Meet transcripts happen locally on your computer (there are no network calls happening) and hence the first version is not super accurate, which is why they have that correction mechanism. So, this is a hard to solve, Google Meet specific problem.

What can you do as workarounds?

Like I mentioned above, this problem is closely related to Google Meet's transcription strategy and I can do very little about it. However, please do try these workarounds and let me know if they help:

If the speech is a non-English languages, select that language for transcription. See the video on the chrome store extension page, if you need help with this.
If the speech is English, try with different country specific English options in the language dropdown. Experiment what works best so that the transcript is more accurate in the first version and corrections are minimal.
Google Meet just launched translated captions. See if anything on that front can help since that is most likely going to use cloud. TranscripTonic can capture translated captions as well.

DevRGT commented 3 months ago

Hello Vivek,

Thank you for your reply. I tried the Workarrounds. 1) Our meetings are in english but even in German or french I get same issue 2) I tried, but the issue is more that the transcription is really in live so doesn'T improve 3) I tried for exemple a Meeting in english translated in french. It seems better but still not so good.

I will try to modify the delay from 1 second to 3-5 seconds to see the impact.

Also, I put your code in Chat GPT and explained my problem and ask for an idea to solve it. The proposal is to add a mechanism to check if the transcription as evoluated and take the latest version. I didn't tried it, but maybe it's a piste of solution. What do you think about it?

--- PROPOSAL OF CHAT GPT---- To improve the capture process and ensure that the most mature, latest version of each caption is captured, you can use a combination of techniques. One approach is to introduce a longer delay between initial mutation detection and the final capture, to allow for in-live corrections by Google Meet. Additionally, you can implement a mechanism to check if a particular transcript entry has stabilized (i.e., has not changed for a certain period) before adding it to the transcript array.

Here's an updated version of the transcriber function that incorporates these improvements:

`let transcriptBuffer = {}; const stabilizationDelay = 2000; // Delay to wait for caption stabilization (2 seconds) const checkInterval = 500; // Interval to check if captions have changed (0.5 seconds)

function transcriber(mutationsList, observer) { mutationsList.forEach(mutation => { try { // CRITICAL DOM DEPENDENCY. Get all people in the transcript const people = document.querySelector('.a4cQT').firstChild.firstChild.childNodes; // Begin parsing transcript if (document.querySelector('.a4cQT')?.firstChild?.firstChild?.childNodes.length > 0) { // Get the last person const person = people[people.length - 1]; // CRITICAL DOM DEPENDENCY const currentPersonName = person.childNodes[0].textContent; // CRITICAL DOM DEPENDENCY const currentTranscriptText = person.childNodes[1].lastChild.textContent;

    const timestamp = new Date().toLocaleString("default", timeFormat).toUpperCase();
    if (!transcriptBuffer[currentPersonName]) {
      transcriptBuffer[currentPersonName] = {
        text: currentTranscriptText,
        timestamp: timestamp,
        lastModified: Date.now()
      };
    } else {
      transcriptBuffer[currentPersonName].text = currentTranscriptText;
      transcriptBuffer[currentPersonName].lastModified = Date.now();
    }
  }
} catch (error) {
  console.error(error);
  if (isTranscriptDomErrorCaptured == false && hasMeetingEnded == false) {
    console.log(reportErrorMessage);
    showNotification(extensionStatusJSON_bug);
  }
  isTranscriptDomErrorCaptured = true;
}

}); }

function checkForStabilizedCaptions() { const now = Date.now(); for (const person in transcriptBuffer) { if (now - transcriptBuffer[person].lastModified >= stabilizationDelay) { // Push the stabilized transcript to the final array transcript.push({ "personName": person, "timeStamp": transcriptBuffer[person].timestamp, "personTranscript": transcriptBuffer[person].text }); overWriteChromeStorage(["transcript"], false);

  // Remove the stabilized entry from the buffer
  delete transcriptBuffer[person];
}

} }

// Call checkForStabilizedCaptions periodically to capture mature captions setInterval(checkForStabilizedCaptions, checkInterval);

// Observer configuration and initialization code remains unchanged `

Explanation: Transcript Buffer: A buffer (transcriptBuffer) is used to store the latest captions for each person, along with the timestamp and the last modification time. Stabilization Delay: A delay (stabilizationDelay) is set to wait for a certain period (2 seconds) to ensure that the caption has stabilized and no further changes are detected. Check Interval: An interval (checkInterval) is set to periodically check if any captions have stabilized and then add them to the final transcript array. Stabilization Check: The checkForStabilizedCaptions function checks if any captions have not changed for the stabilization delay period and then pushes these stabilized captions to the final transcript array. By using this approach, the code waits for a sufficient amount of time to allow for any in-live corrections by Google Meet before capturing and storing the most mature version of the captions.

vivek-nexus commented 3 months ago

The solution suggested by ChatGPT doesn't help solve the problem. When a mutation is fired, there is no way to differentiate between "transcription correction" and "new words added". All we can say is that the transcript text is updated. So the stabilisation concept is not appropriate.

Anyways, I figured out a solution to do this. I have pushed it to this branch, extension v2.1.3. Test it out and let me know.

DevRGT commented 3 months ago

Hello, Thank you for your reply. I updated, the sentences are now more understandable, but it seems that it's missing some part of the sentence. Sometimes the begining, sometimes the end.

But there isn't anymore plenty of "." in the middle of the sentence

vivek-nexus commented 3 months ago

Whenever the transcript of the same person reaches 250 characters length, this will happen. See https://github.com/vivek-nexus/transcriptonic/blob/main/extension/content.js#L314.

This is the best middle ground solution I could manage with my current bandwidth and knowledge. If I figure out a better solution in the future, I will inform in this thread.

vivek-nexus / transcriptonic