openai / openai-node

Official JavaScript / TypeScript library for the OpenAI API
https://www.npmjs.com/package/openai
Apache License 2.0
8.01k stars 876 forks source link

Model can generate enormous amounts of whitespace/newlines and then the structured output is truncated #1185

Open uriva opened 2 weeks ago

uriva commented 2 weeks ago

Confirm this is a Node library issue and not an underlying OpenAI API issue

Describe the bug

When using json_schema, sometimes the model outputs thousands of newlines.

At some point it reaches the max output length and stops, causing it to sometimes return a non valid json object (it didn't finish writing it).

Because of how this library is structured, the developer can't see the problem (the output is not logged).

I've found that including

logit_bias: { "1734": -100 }, // Prevent model from generating newlines.

will solve this.

To Reproduce

happens randomly when using json_schema

Code snippets

No response

OS

irrelevant

Node version

irrelevant

Library version

4.71.1

uriva commented 2 weeks ago

I suggest to include this by default, and exposing the content to the user when JSON.parse fails.

uriva commented 2 weeks ago

the exception looks like this:

error: Uncaught (in promise) SyntaxError: Expected double-quoted property name in JSON at position 18943 (line 1899 column 9)

    at JSON.parse (<anonymous>)
    at Object.<anonymous> (file:///home/uri/.cache/deno/npm/registry.npmjs.org/openai/4.67.3/helpers/zod.mjs:58:42)
    at parseResponseFormat (file:///home/uri/.cache/deno/npm/registry.npmjs.org/openai/4.71.1/lib/parser.mjs:80:36)
    at file:///home/uri/.cache/deno/npm/registry.npmjs.org/openai/4.71.1/lib/parser.mjs:66:21
    at Array.map (<anonymous>)
    at parseChatCompletion (file:///home/uri/.cache/deno/npm/registry.npmjs.org/openai/4.71.1/lib/parser.mjs:53:40)
    at file:///home/uri/.cache/deno/npm/registry.npmjs.org/openai/4.71.1/resources/beta/chat/completions.mjs:22:42
    at file:///home/uri/.cache/deno/npm/registry.npmjs.org/openai/4.71.1/core.mjs:74:84
uriva commented 2 weeks ago

a better fix is adding to the prompt "make the output json as short as possible, no redundant whitespace."

RobertCraigie commented 2 weeks ago

Thanks for the report, which model are you using? and do you have a request ID you could share?

uriva commented 2 weeks ago

id chatcmpl-ASoSnbTtNnpIcQnEZ09NBz5y7t4f2

system fingerprint fp_159d8341cc

uriva commented 6 hours ago

Happened again in another project of mine.

btw this was also reported in https://github.com/openai/openai-node/issues/596

Wondering if any news?