openai / openai-node

The official Node.js / Typescript library for the OpenAI API
https://www.npmjs.com/package/openai
Apache License 2.0
7.65k stars 819 forks source link

Memory Leak in vectorStores.fileBatches.uploadAndPoll #1052

Open mezozawahra opened 1 week ago

mezozawahra commented 1 week ago

Confirm this is a Node library issue and not an underlying OpenAI API issue

Describe the bug

I use S3 Bucket to getObject as stream and then upload it to a vector store using vectorStores.fileBatches.uploadAndPoll, the memory usage is supposed to increase during the upload process and return to its baseline after finishing but it doesn't, I uploaded a 22mb file and memory usage went from 300mb to 360mb and never returned to baseline Note - 1: the upload process is part of POST request in Express.js, and memory usage remained high even after returning the json response Note - 2 I tried saving the file to disk using "fs", it consumed memory and returned back to baseline, so I think the bug is in OpenAI SDK

To Reproduce

export async function post(req:Request , res: Response, next: NextFunction){ try {

    let client = new S3Client({
        forcePathStyle: true,
        credentials: {
            accessKeyId: process.env.S3_ACCESS_KEY_ID!,
            secretAccessKey: process.env.S3_SECRET_ACCESS_KEY!,
        },
        endpoint: process.env.S3_ENDPOINT!,
        region: "us-east-1"
    })

    const params = {
        Bucket: "first-bucket",
        Key: req.params.id,
    }

    const command = new GetObjectCommand( params );
    const response = await client.send( command );

    const openai = new OpenAI({
        apiKey: process.env['OPENAI_API_KEY'],
    });

    await openai.beta.vectorStores.fileBatches.uploadAndPoll(<insert vector store ID here>, { files: [await toFile(response.Body!, req.params.id)]})

    return res.json({success: true})
} catch(error){
    next(error)
}

}

import { post } from "./controller/files" router.get("/:id", post)



### Code snippets

_No response_

### OS

Windows 11

### Node version

v19.8.1

### Library version

4.56.0
RobertCraigie commented 1 week ago

Thanks for the report, could you try again with the latest SDK version? We recently fixed a separate bug with file uploads so this might've been fixed as well

mezozawahra commented 1 week ago

@RobertCraigie I tried the latest version (^4.58.1), the memory consumption is reduced, instead of using 60mb it used 44mb, I tried POST request twice on ExpressJS twice ( uploadAndPoll twice ), so the webapp memory usage reached 390mb and stayed on that level

update I tried it again , here's a before and after upload memory usage Before before After after

RobertCraigie commented 1 week ago

Thanks, we'll investigate.