openai-php / client

⚡️ OpenAI PHP is a supercharged community-maintained PHP API client that allows you to interact with OpenAI API.
MIT License
4.75k stars 491 forks source link

How to transcribe remote file stream from Amazon S3 directly instead of local file? #149

Closed bradleybernard closed 1 year ago

bradleybernard commented 1 year ago

Hey folks, I'm storing voice note files (.m4a) in Amazon S3 buckets. I then pass that S3 file stream to the transcribe function, which takes in file in the parameters array, however it does not work as I'd want it to, throwing an exception.

Taking a step back, all code examples I've seen are usually using fopen to create a stream resource to a file on the local filesystem like so:

OpenAI::audio()->transcribe([
    'model' => 'whisper-1',
    'file' => fopen(storage_path('app/'.$this->transcript->hash), 'r'),
    'temperature' => 0.2,
    'response_format' => 'vtt',
]);

I'm trying to modify the file value above to use a file stream directly from S3 (remote). I saw the Laravel Storage facade (https://laravel.com/api/6.x/Illuminate/Support/Facades/Storage.html#method_readStream) has a readStream method, so I tried using that by:

public function handle(Attachment $attachment): void {
   $disk = $attachment->disk;
   $stream = Storage::disk($disk)->readStream($attachment->path);

    $response = OpenAI::audio()->transcribe([
        'model' => 'whisper-1',
        'file' => $stream,
    ]);
}

But that throws an exception on OpenAI's end:

OpenAI\Exceptions\ErrorException  1 validation error for Request
body -> file 
Expected UploadFile, received: <class 'str'> (type=value_error).

I'm wondering how I can stream the audio file contents from Amazon S3, directly into the OpenAI client for transcribe? I saw this article doing it in Node.js: https://dev.to/antondosov/pipe-a-file-stream-from-aws-s3-to-openai-whisper-in-nodejs-3bkb so I'm guessing this should be technically possible in PHP, although look different of course.

Please let me know if I missed anything, I would love some help here! Thanks for this awesome library! 🙌

gehrisandro commented 1 year ago

Hi @bradleybernard

I was not able to reproduce the issue. I've used the Laravel Storage facade (with a local disk instead of S3, but I think this should not matter):

$response = $client->audio()->transcribe([
            'model' => 'whisper-1',
            'file' => Storage::disk('local')->readStream('audio.mp3'),
            'response_format' => 'text',
        ]);
bradleybernard commented 1 year ago

Hey @gehrisandro, I tried again today and it seems to work with the right read flags to support S3. Closing now, thanks!

petericebear commented 1 year ago

@bradleybernard What flag did you used? I have the same issue.

caiokawasaki commented 1 month ago

Run at this same problem, for those stuck with this problem, just add the following inside s3 configuration inside config/filesystems.php:

[
    'disks' => [
        's3' => [
            'stream_reads' => true
        ]
    ]
]

You can read more about it here: https://www.linkedin.com/pulse/solving-laravel-s3-streaming-streamreads-bryan-trudel-pxgic