remotion-dev / remotion

🎥 Make videos programmatically with React
https://remotion.dev
Other
20.83k stars 1.05k forks source link

Issue: Partial Word in remainingText When Parsing Transcription with openAiWhisperApiToCaptions #4474

Closed brennanmceachran closed 2 weeks ago

brennanmceachran commented 2 weeks ago

Bug Report 🐛

While using openAiWhisperApiToCaptions, an error is thrown:

Error: Unable to parse punctuation from OpenAI Whisper output. Could not find word "massive" in text "ssive step for accessibility in app development. One user was blown away by how quickly they could c".

The error seems to result from remainingText containing a truncated version of the expected word, leading to mismatches during regex matching.

Reproduction

Reproducible with a truncated transcript received from openAI

import React from 'react';
import { openAiWhisperApiToCaptions } from "@remotion/openai-whisper";

const transcript = {
  task: "transcribe",
  text: "which is a massive.",
  words: [
    {
      end: 12.180000305175781,
      word: "which",
      start: 12.079999923706055,
    },
    {
      end: 12.300000190734863,
      word: "is",
      start: 12.180000305175781,
    },
    {
      end: 12.619999885559082,
      word: "a",
      start: 12.300000190734863,
    },
    {
      end: 12.779999732971191,
      word: "massive",
      start: 12.619999885559082,
    },
  ],
  duration: 12.779999732971191,
  language: "english",
};

const TestPage = () => {

    const captions = openAiWhisperApiToCaptions({
      transcription: transcript,
    });

    return (
        <div>
            <div className='whitespace-pre-line'>
                {JSON.stringify(captions, null, 2)}
            </div>
        </div>
    );
};

export default TestPage;
brennanmceachran commented 2 weeks ago

An even more reduced transcripts below.

I believe the {0,4} in the regex pattern is the root cause of the issue. It introduces a problem with short, common words like "a." Instead of matching the first instance of "a" in remainingText, the regex can skip over the first "a" and match the second instance if it falls within the first few characters. This misalignment leads to incorrect truncation, as remainingText is sliced at the wrong location.

For example, in remainingText = "a man", when looking for the word "a", the regex matches the second "a" (in "man"), leaving remainingText = "n", which breaks the expected sequence and triggers a parsing error in the function.

const transcript = {
  task: "transcribe",
  text: "a man",
  words: [
    {
      start: 1,
      end: 2,
      word: "a",
    },
    {
      start: 2,
      end: 3,
      word: "man",
    },
  ],
  duration: 3,
  language: "english",
};

Or


const transcript = {
  task: "transcribe",
  text: "i mint",
  words: [
    {
      start: 1,
      end: 2,
      word: "i",
    },
    {
      start: 2,
      end: 3,
      word: "mint",
    },
  ],
  duration: 3,
  language: "english",
};
JonnyBurger commented 2 weeks ago

Thanks for reporting!

brennanmceachran commented 2 weeks ago

🙏 thanks @JonnyBurger