openai / openai-node

Official JavaScript / TypeScript library for the OpenAI API
https://www.npmjs.com/package/openai
Apache License 2.0
8.03k stars 877 forks source link

Bug Report: `openai.files.create` Method Hangs and Does Not Throw Errors in Docker Containers #910

Closed simonorzel26 closed 5 months ago

simonorzel26 commented 5 months ago

Confirm this is a Node library issue and not an underlying OpenAI API issue

Describe the bug

Bug Report: openai.files.create Method Hangs and Does Not Throw Errors in Docker Containers

Bug Report ID: BR-20240622-001

Title: openai.files.create Method Hangs and Does Not Throw Errors in Docker Containers

Reporter: simonorzel26

Date: June 22, 2024

Environment:

Description: The openai.files.create method hangs indefinitely and does not throw any errors when executed within a Docker container. The method works as expected when run locally on the host machine via building js scripts and running them. This behavior prevents the file upload process from completing, causing the application to stall and making the issue unfixable due to the lack of error feedback and only timeout error happens.

Steps to Reproduce:

  1. Set up a Docker container using oven/bun:latest-debian, bun:debian, or similar variants.
  2. Implement and call the openai.files.create method within the container.
  3. Observe that the method hangs and does not throw any errors.

Expected Behavior: The openai.files.create method should either successfully upload the file and return the file ID or throw an error if the upload fails.

Actual Behavior: The method hangs indefinitely and does not throw any errors, causing the application to stall and timeout the request.

Additional Information:

Forgive the mess of 8+hrs of debugging: Code:

FROM imbios/bun-node:20-slim AS deps
ARG DEBIAN_FRONTEND=noninteractive

# Install necessary packages
RUN apt-get update -y && \
    apt-get install -y --no-install-recommends \
    openssl \
    git \
    ca-certificates \
    tzdata && \
    ln -fs /usr/share/zoneinfo/Europe/Berlin /etc/localtime && \
    echo "Europe/Berlin" > /etc/timezone && \
    dpkg-reconfigure -f noninteractive tzdata && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Install dependencies based on the preferred package manager
COPY package.json bun.lockb ./
RUN mkdir -p packages/scraper
RUN mkdir -p packages/shared
COPY ./.env ./.env
COPY ./packages/scraper/package.json ./packages/scraper/package.json
COPY ./packages/shared/package.json ./packages/shared/package.json
COPY ./loadEnv.ts ./loadEnv.ts
COPY ./env.js ./env.js
COPY ./packages/scraper/dbscraper/ ./packages/scraper/dbscraper/

COPY ./.env ./.env
RUN bun install 

# Build the app
FROM deps AS builder
WORKDIR /app
COPY --from=deps /app .
COPY ./packages/scraper/ ./packages/scraper/
COPY ./packages/shared/ ./packages/shared/

RUN bun scraper:build

RUN ls -la .

# Production image, copy all the files and run next
FROM imbios/bun-node:20-slim AS runner
WORKDIR /app

# Create a system group and user for running the application
RUN addgroup --system --gid 1002 bunjs
RUN adduser --system --uid 1002 --ingroup bunjs bunuser

# Set environment variables
ARG CONFIG_FILE=.env
COPY $CONFIG_FILE /app/.env
ENV NODE_ENV production
# Uncomment the following line in case you want to disable telemetry during runtime.
# ENV NEXT_TELEMETRY_DISABLED 1

# Copy necessary files and set correct permissions
COPY --from=builder /app/packages/scraper/generated ./generated
COPY --from=builder /app/package.json /app/bun.lockb ./
COPY --from=builder /app/node_modules/.prisma /app/node_modules/.prisma
COPY --from=builder /app/node_modules/@prisma /app/node_modules/@prisma

RUN ls -la .
RUN ls -la ./generated

# Change ownership of all necessary files and directories to the non-root user
RUN chown -R bunuser:bunjs /app

# Ensure that bunuser has access to necessary directories
RUN chmod -R 755 /app

# Switch to the non-root user
USER bunuser

# Expose the application port
EXPOSE 3000

# Set additional environment variables
ENV PORT 3000
ENV HOSTNAME "0.0.0.0"

# Mount a volume for temporary files
VOLUME ["/tmp"]

# Run the application
CMD ["bun", "generated/htmlConsumer.js"]
import { scraperPrompt } from "@shared";
import { File } from "node-fetch";
import OpenAI from "openai";
import type { GetManyRequestsByIdReturnType } from "./dbscraper";

// Define the structure of the batch request
interface BatchRequest {
    custom_id: string;
    method: string;
    url: string;
    body: {
        model: string;
        messages: Array<{ role: string; content: string }>;
        max_tokens: number;
    };
}

// Define the structure of the file object returned by OpenAI
interface FileObject {
    id: string;
    purpose: string;
    filename: string;
    bytes: number;
    created_at: number;
    status: string;
    status_details?: string | undefined;
}

// Define the OpenAI client initialization
const openai = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY, // replace with your actual API key
    timeout: 120000, // Set timeout to 60 seconds
});

// Prepare the batch data in-memory and upload directly
async function prepareAndUploadBatchFile(
    batchRequests: BatchRequest[],
): Promise<string> {
    try {
        console.log(`Preparing batch file with ${batchRequests.length} requests`);
        const batchData = batchRequests
            .map((req) => `${JSON.stringify(req)}\n`)
            .join("");

        const buffer = Buffer.from(batchData, "utf-8");

        console.log(`Created batch file with ${batchRequests.length} requests`);
        // Create a File object from the buffer
        const file = new File([buffer], `batch-${Date.now()}.txt`, {
            type: "text/plain",
        });

        return await uploadBatchFile(file);
    } catch (error) {
        console.error("Error in prepareAndUploadBatchFile:", error);
        throw error;
    }
}

// Upload the batch file and return the file ID
async function uploadBatchFile(file: File): Promise<string> {
    try {
        console.log(`Uploading batch file ${file.name}`);
        const fileObject: FileObject = await openai.files.create({
            file: file,
            purpose: "batch",
        });
        return fileObject.id;
    } catch (error) {
        console.error("Error in uploadBatchFile:", error);
        throw error;
    }
}

// Create a batch and return the batch ID
async function createBatch(inputFileId: string): Promise<string> {
    try {
        console.log(`Creating batch with input file ID ${inputFileId}`);
        const batch: OpenAI.Batch = await openai.batches.create({
            input_file_id: inputFileId,
            endpoint: "/v1/chat/completions",
            completion_window: "24h",
        });
        return batch.id;
    } catch (error) {
        console.error("Error in createBatch:", error);
        throw error;
    }
}

export const createBatchFromRequests = async (
    requests: GetManyRequestsByIdReturnType,
): Promise<string> => {
    try {
        const batchRequests: BatchRequest[] = requests.map((request) => {
            return {
                custom_id: request.id,
                method: "POST",
                url: "/v1/chat/completions",
                body: {
                    model: process.env.GPT_MODEL as string,
                    response_format: { type: "json_object" },
                    messages: [
                        {
                            role: "system",
                            content: scraperPrompt,
                        },
                        {
                            role: "user",
                            content: `${request.prompt}\n\n${request?.Html?.html}`,
                        },
                    ],
                    max_tokens: 4096,
                },
            };
        });

        const inputFileId = await prepareAndUploadBatchFile(batchRequests);

        const batchId: string = await createBatch(inputFileId);
        console.log(`Batch created with ID ${batchId}`);

        return batchId;
    } catch (error) {
        console.error("Error in createBatchFromRequests:", error);
        throw error;
    }
};

Error Log:

Preparing batch file with 1 requests
Created batch file with 1 requests
Uploading batch file batch-1719043483257.txt
Error in uploadBatchFile: 32522 | class OpenAIError extends Error {
32523 | }
32524 | 
32525 | class APIError extends OpenAIError {
32526 |   constructor(status, error, message, headers) {
32527 |     super(`${APIError.makeMessage(status, error, message)}`);
                              ^
error: Request timed out.
      at new OpenAIError (:1:23)
      at new APIError (/app/generated/htmlConsumer.js:32527:5)
      at new APIConnectionError (/app/generated/htmlConsumer.js:32592:5)
      at new APIConnectionTimeoutError (/app/generated/htmlConsumer.js:32601:5)
      at /app/generated/htmlConsumer.js:33381:15

Error in prepareAndUploadBatchFile: 32522 | class OpenAIError extends Error {
32523 | }
32524 | 
32525 | class APIError extends OpenAIError {
32526 |   constructor(status, error, message, headers) {
32527 |     super(`${APIError.makeMessage(status, error, message)}`);
                              ^
error: Request timed out.
      at new OpenAIError (:1:23)
      at new APIError (/app/generated/htmlConsumer.js:32527:5)
      at new APIConnectionError (/app/generated/htmlConsumer.js:32592:5)
      at new APIConnectionTimeoutError (/app/generated/htmlConsumer.js:32601:5)
      at /app/generated/htmlConsumer.js:33381:15

Error in createBatchFromRequests: 32522 | class OpenAIError extends Error {
32523 | }
32524 | 
32525 | class APIError extends OpenAIError {
32526 |   constructor(status, error, message, headers) {
32527 |     super(`${APIError.makeMessage(status, error, message)}`);
                              ^
error: Request timed out.
      at new OpenAIError (:1:23)
      at new APIError (/app/generated/htmlConsumer.js:32527:5)
      at new APIConnectionError (/app/generated/htmlConsumer.js:32592:5)
      at new APIConnectionTimeoutError (/app/generated/htmlConsumer.js:32601:5)
      at /app/generated/htmlConsumer.js:33381:15

Error creating batch from requests: Request timed out.
Error processing batch: Request timed out.

Severity: Medium

Priority: P2

Status: Open

Notes: This issue appears to be specific to running the OpenAI file upload method inside a Docker container. Further investigation is required to determine if this is a configuration issue, a problem with the Docker environment, or an issue with the OpenAI SDK/API when used in this context.

To Reproduce

Steps to Reproduce:

  1. Set up a Docker container using oven/bun:latest-debian, bun:debian, or similar variants.
  2. Implement and call the openai.files.create method within the container.
  3. Observe that the method hangs and does not throw any errors.

Code snippets

No response

OS

macOs

Node version

node v20.12 bun 1.1.15

Library version

4.51.0

Jarred-Sumner commented 5 months ago

This might be caused by a bug in Bun and not in openai-node. We don't implement support for sending streaming request bodies yet via node:http clients (not the server)

simonorzel26 commented 5 months ago

This might be caused by a bug in Bun and not in openai-node. We don't implement support for sending streaming request bodies yet via node:http clients (not the server)

@Jarred-Sumner Amazing, thank you! This was the problem, my ./generated bun build files were the ones being run on my container, but if i just run the scripts via bun script.ts it works fine.

Thank you!