Support for web ReadableStream without buffering the whole file.

nicolopadovan commented 8 months ago

Confirm this is a Node library issue and not an underlying OpenAI API issue

[X] This is an issue with the Node library

Describe the bug

Whenever a web ReadableStream is passed to the toFile helper function, the full contents of the file are buffered before forwarding the request to the Whisper OpenAI API endpoint. However, it is possible to avoid having to buffer the whole file into the server memory, and instead just use the server as a middleware that streamlines the data from the source of the file to the API endpoint. The problem has been discussed on issue #414 as well.

A workaround using axios and FormData that seems to work:

import FormData = require("form-data");
import axios from "axios";

async function foo() {

  const bucket = storage.bucket(bucketName);
  const file = bucket.file(fileName);
  const readStream = file.createReadStream();

  const form = new FormData();

  // Make sure that the file has the proper extension as well
  // (In this example, we just add the `webm` extension for brevity
  if (!fileName.includes(".")) {
      fileName = `${fileName}.webm`;
  }

  form.append("file", readStream, fileName);
  form.append("model", "whisper-1");

  const apiKey = OPENAI_API_KEY

  const response = await axios.post(
      "https://api.openai.com/v1/audio/transcriptions",
      form,
      {
          headers: {
              ...form.getHeaders(),
              Authorization: `Bearer ${apiKey}`,
          },
      }
  );

  const transcription = response.data.text:
  return transcription;
}

To Reproduce

Example uses Cloud / Firebase Storage.

Code snippets

import {OpenAI, toFile} from 'openai'

const bucket = storage.bucket(bucketName);
const file = bucket.file(fileName);

const openai = new OpenAI({
    apiKey: apiKey,
});

const completion = await openai.audio.transcriptions.create({
    file: await toFile(file, "myfile.mp3"),
    model: "whisper-1",
});

OS

Linux (Google Cloud Functions)

Node version

18

Library version

4.11.1

rattrayalex commented 8 months ago

Thanks for filing, I would like to do this at some point.

Note that this applies to all forms of streams, not just web ReadableStreams (and the GCP libraries return NodeJS ReadableStreams, not web ones).

Unfortunately, I'm not sure this will really be possible in a clean way until OpenAI's backend can infer content-types from the contents instead of from the filenames, since you can't construct a File class with a stream, and you need the filename.

I'll leave this as an open TODO for now.

nicolopadovan commented 8 months ago

Thanks for filing, I would like to do this at some point.

Note that this applies to all forms of streams, not just web ReadableStreams (and the GCP libraries return NodeJS ReadableStreams, not web ones).

Unfortunately, I'm not sure this will really be possible in a clean way until OpenAI's backend can infer content-types from the contents instead of from the filenames, since you can't construct a File class with a stream, and you need the filename.

I'll leave this as an open TODO for now.

Wouldn’t it be possible to infer the file type server-side, and using that to specify the correct extension for the API to work with? In my proposed workaround, I am just naively adding the .webm extension for brevity, but it would be possible to improve that snippet in order to infer the actual filetype. You can see that I am passing the readstream as well as the filename in the form, without actually buffering the file into the server itself. Obviously, the best thing would be if the API could directly use the Google Storage file, without having to pass it through the server, but this workaround allows at least to reduce the memory that is needed at any given moment for the operation, since the readstream data is deallocated as soon as possible.

I will be working on a PR to implement this if I have the time :)

rattrayalex commented 8 months ago

Thank you. This relates to https://github.com/openai/openai-node/issues/271 but I think we would approach it differently.

In this case, it seems it may be simplest to allow params: FormData in place of params: TranscriptionCreateParams, eg:

import {OpenAI} from 'openai'

const openai = new OpenAI();

const form = new FormData();
form.append("file", myReadStream, myFileName);
form.append("model", "whisper-1");

const transcription = await openai.audio.transcriptions.create(form);

I'm not sure how trivial this will be for us (it may be relatively simple).

openai / openai-node