speechmatics / speechmatics-js-sdk

Javascript and Typescript SDK for Speechmatics
MIT License
39 stars 4 forks source link

microphone_nextjs example: Howto push finalized transcriptions into separate <div>'s ? #48

Open Box333 opened 2 weeks ago

Box333 commented 2 weeks ago

The microphone_nextjs example concatenates the transcribed parts as a endless growing text block without line breaks and renders it to a <p> element.

I am trying to reproduce the same functionality as presented in the radio demo on your website, where each finalized sentence is pushed into a new <div> element.

Can you share some insight on how to accomplish this ? Many Thanks

nickgerig commented 2 weeks ago

@Box333 The radio translation demo actually drives the splitting off finalised translations. There is some logic around that which is also used within the Speechmatics microphone demo in the Portal. We can extract that into some utilities but it does get a little complicated depending on whether you are using translation, diarization, disfluences etc.

Can I assume you aren't wanting to align with translation and just want the returned transcript split into readable segments?

Box333 commented 2 weeks ago

@nickgerig ,

thank you for getting back.... the problem with the microphone/translation demo in the Portal is that transcription results are appended without line breaks resulting in a huge text block. But If the translation text in the right pane is shorter (or longer) due to linguistic peculiarities, the translation will be sliding upwards, already after a few minutes, until it can no longer be seen.

I'd like to reproduce the same nice effect, like in the radio demo where the returned (full) transcripts are split at full stops (.,?,!) (=senctences) are stored in segments (=<div>s) while the partials are building up the next segment giving you this "on the fly" effect. I've been analyzing the code in my browsers dev tools and i can see that it gets very complicated, when timings, diarization, disfluences etc also come into play. But for now id be glad to have a basic functionality where transcript and trranslation are saparated by sentence and scrolling in sync in 2 side-by-side containters.

so far, i added the translation part to my index.tsx.

pages/index.tsx:
.... +added
import {  AddTranslation, TranslatedSentence } from 'speechmatics';
const [translation, setTranslation] = useState<TranslatedSentence[]>([]);

  rtSessionRef.current.addListener('AddTranslation', (res) => {
    setTranslation([...translation, ...res.results]);
  });
.....  +
<div >
          <div className="right">
            {translation.map(
              (item, index) =>
                <div className="fullTrans" dangerouslySetInnerHTML={{ __html: item?.content}} /> 
          )}
         </div>
nickgerig commented 1 week ago

@Box333 I had a look into the code that we use for the radio demo and it's not going to be straightforward to extract it I'm afraid. But I can describe briefly what it does:

We maintain 4 arrays: unassigned, transcripts, translations and transcript partials.

When we get a translation final, we take the start and end times of that and find the transcription match in the unassigned array and then push that into the transcript array. So the start and end time of the translations are what define the transcript segments.

The UI is then rendered based on this data structure.

Hope that helps.