openai / openai-node

The official Node.js / Typescript library for the OpenAI API
https://www.npmjs.com/package/openai
Apache License 2.0
7.66k stars 821 forks source link

[Whisper] Prompting frequently yields hallucinations #150

Closed mosnicholas closed 1 year ago

mosnicholas commented 1 year ago

Describe the bug

I’m using whisper through NextJS. I’m calling the API directly, given that the openai-node package doesn’t have great support for the whisper API ([Whisper] cannot call createTranscription function from Node.js due to File API · Issue #77 · openai/openai-node · GitHub).

I’m calling it like so:

import fetch, { FormData, File } from 'node-fetch';

const handler = (req, res) => {
  const { data } = await supabase.storage
    .from('audio-transcripts')
    .download(`${interviewId}.mp3`);

  const formData = new FormData();
  const file = new File([data], `${interviewId}.mp3`);
  formData.append('file', file);
  formData.append('model', 'whisper-1');
  formData.append('prompt', prompt);
  formData.append('response_format', 'json');
  formData.append('temperature', '0.01');
  formData.append('language', 'en');

  const response = await fetch(
    'https://api.openai.com/v1/audio/transcriptions',
    {
      method: 'POST',
      body: formData,
      headers: {
        Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
      },
    }
  );

  return res.status(200).json({ text: response.text });
}

I’m running into a lot of issues using the prompt. It is usually a two sentence prompt that has (1) a high level overview of the transcript (it is a call about xyz), and (2) some proper nouns and competitor names that I want to make sure a spelt correctly. This follows the example provided in OpenAI’s docs. An example of the type of prompt I'm using is: This transcript is about Bayern Munich and various soccer teams. Common competitors in the league include Real Madrid, Barcelona, Manchester City, Liverpool, Paris Saint-Germain, Juventus, Chelsea, Borussia Dortmund, and AC Milan.

80% of the time I use the prompt, however, I get garbage, hallucinated, output. It ends up on a loop, repeating the same thing (eg. a competitor’s name, or a url made up from one of the competitor’s names). An example of the output I'm seeing: In the past, the league has been a place of competition for the players. The league has been a place of competition for the players. The league has been a place of competition for the players. The league has been a place of competition for the players. The league has been a place of competition for the players. The league has been a place of competition for the players....

  1. Am I calling it incorrectly?
  2. What is happening, and how can I fix it? I’d like to ideally use the prompt to make sure spelling of competitor names is correct.

Note, this sometimes works better when calling the API directly from the terminal with the prompt.

To Reproduce

Download an audio file, and input a prompt like the one I shared. Example of the output I see:

Code snippets

No response

OS

macOS

Node version

v19.9.0

Library version

3.2.1

poborin commented 1 year ago

I can report a similar issue. The transcript I'm getting is:

It's like a, uh, it's like a, Oblivion Charm Do you know what that is? It's a, uh, um, what's that called? What's that called? It's like a, um, a, um, gum... What's that called? Huh? It's a gum... A gum... Okay, now I'd like to let you know, So actually, I have a feature... So if you go to Amazon,

Meanwhile, the original audio contains a speech about analyzing a customer's skin, determining their skin type, and recommending the appropriate products and treatments.

fullstacklat commented 1 year ago

Even with the last version (v.4.0) I have this problem with recorded audio from Safari (MacOs & iOS). The mp4 file is correctly uploaded to the server and can reproduce too, but the transcription always is "Subtítulos realizados por la comunidad de Amara.org". In Windows (chrome & firefox) work fine, in MacOS with Chrome it's ok too. In MacOS with Safari & iOS (Chrome & Safari) always say: "Subtítulos realizados por la comunidad de Amara.org"

rattrayalex commented 1 year ago

Thanks for reporting! Issues on this repo are intended for bugs in the library; please report bugs in the API to https://community.openai.com/c/api/bugs/30