Open nicolopadovan opened 1 year ago
Thanks for filing, I would like to do this at some point.
Note that this applies to all forms of streams, not just web ReadableStreams (and the GCP libraries return NodeJS ReadableStreams, not web ones).
Unfortunately, I'm not sure this will really be possible in a clean way until OpenAI's backend can infer content-types from the contents instead of from the filenames, since you can't construct a File
class with a stream, and you need the filename.
I'll leave this as an open TODO for now.
Thanks for filing, I would like to do this at some point.
Note that this applies to all forms of streams, not just web ReadableStreams (and the GCP libraries return NodeJS ReadableStreams, not web ones).
Unfortunately, I'm not sure this will really be possible in a clean way until OpenAI's backend can infer content-types from the contents instead of from the filenames, since you can't construct a
File
class with a stream, and you need the filename.I'll leave this as an open TODO for now.
Wouldn’t it be possible to infer the file type server-side, and using that to specify the correct extension for the API to work with? In my proposed workaround, I am just naively adding the .webm extension for brevity, but it would be possible to improve that snippet in order to infer the actual filetype. You can see that I am passing the readstream as well as the filename in the form, without actually buffering the file into the server itself. Obviously, the best thing would be if the API could directly use the Google Storage file, without having to pass it through the server, but this workaround allows at least to reduce the memory that is needed at any given moment for the operation, since the readstream data is deallocated as soon as possible.
I will be working on a PR to implement this if I have the time :)
Thank you. This relates to https://github.com/openai/openai-node/issues/271 but I think we would approach it differently.
In this case, it seems it may be simplest to allow params: FormData
in place of params: TranscriptionCreateParams
, eg:
import {OpenAI} from 'openai'
const openai = new OpenAI();
const form = new FormData();
form.append("file", myReadStream, myFileName);
form.append("model", "whisper-1");
const transcription = await openai.audio.transcriptions.create(form);
I'm not sure how trivial this will be for us (it may be relatively simple).
I am also facing this issue, so it would be really nice to have a solution for it. Can we make a pull request ?
@jorgealemangonzalez what solution were you planning on implementing?
Is there any update on the issue?
@rattrayalex
@RobertCraigie is it possible for this to be addressed in the upcoming migration to built-in fetch?
Confirm this is a Node library issue and not an underlying OpenAI API issue
Describe the bug
Whenever a web
ReadableStream
is passed to thetoFile
helper function, the full contents of the file are buffered before forwarding the request to the Whisper OpenAI API endpoint. However, it is possible to avoid having to buffer the whole file into the server memory, and instead just use the server as a middleware that streamlines the data from the source of the file to the API endpoint. The problem has been discussed on issue #414 as well.A workaround using
axios
andFormData
that seems to work:To Reproduce
Example uses Cloud / Firebase Storage.
Code snippets
OS
Linux (Google Cloud Functions)
Node version
18
Library version
4.11.1