gptscript-ai / desktop

MIT License
19 stars 13 forks source link

Knowledge - Ingestion of pdf files fail sometime. #275

Closed sangee2004 closed 3 days ago

sangee2004 commented 3 weeks ago

Electron build from - b4977d084ee

Steps to reproduce the problem:

  1. Start a chat with Tildy
  2. Use Add Knowledge to add this is file - 1000-Ways-to-Make-1000-Dollars.pdf

Uploading of file fails with following error in UI

Error uploading knowledge 1000-Ways-to-Make-1000-Dollars.pdf: Error: An error occurred in the Server Components render. The specific message is omitted in production builds to avoid leaking sensitive details. A digest property is included on this error instance which may provide additional details about the nature of the error.

Following errors are seen in the logs:

2024-08-23T21:59:28.602Z [server] [ERROR] Error: Command failed: /Users/sangeethahariharan/acorn/desktop/bin/knowledge ingest --dataset tb0z3n /Users/sangeethahariharan/Library/Application\ Support/acorn/Acorn/threads/tb0z3n/workspace/knowledge
2024/08/23 14:59:23 INFO Using embedding model provider provider=openai config="{BaseURL:https://gateway-api.gptscript.ai/llm APIKey:REDACTED Model:gpt-4 EmbeddingModel:text-embedding-ada-002 EmbeddingEndpoint:/embeddings APIVersion:2024-02-01 APIType:OPEN_AI AzureOpenAIConfig:{Deployment:}}"
2024/08/23 14:59:23 INFO Created dataset id=tb0z3n
warning: expected name after CMapName in cmap
2024/08/23 14:59:28 ERROR Failed to add documents error="couldn't add document '643caa0d-5dd9-4f18-8ac5-65f12247157a': couldn't create embedding of document: error response from the embedding API: 500 Internal Server Error"
2024/08/23 14:59:28 failed to add documents: couldn't add document '643caa0d-5dd9-4f18-8ac5-65f12247157a': couldn't create embedding of document: error response from the embedding API: 500 Internal Server Error

    at genericNodeError (node:internal/errors:984:15)
    at wrappedFn (node:internal/errors:538:14)
    at ChildProcess.exithandler (node:child_process:422:12)
    at ChildProcess.emit (node:events:519:28)
    at maybeClose (node:internal/child_process:1105:16)
    at Socket.<anonymous> (node:internal/child_process:457:11)
    at Socket.emit (node:events:519:28)
    at Pipe.<anonymous> (node:net:338:12)
    at Pipe.callbackTrampoline (node:internal/async_hooks:130:17) {
  code: 1,
  killed: false,
  signal: null,
  cmd: '/Users/sangeethahariharan/acorn/desktop/bin/knowledge ingest --dataset tb0z3n /Users/sangeethahariharan/Library/Application\\ Support/acorn/Acorn/threads/tb0z3n/workspace/knowledge',
  stdout: '',
  stderr: '2024/08/23 14:59:23 INFO Using embedding model provider provider=openai config="{BaseURL:https://gateway-api.gptscript.ai/llm APIKey:REDACTED Model:gpt-4 EmbeddingModel:text-embedding-ada-002 EmbeddingEndpoint:/embeddings APIVersion:2024-02-01 APIType:OPEN_AI AzureOpenAIConfig:{Deployment:}}"\n' +
    '2024/08/23 14:59:23 INFO Created dataset id=tb0z3n\n' +
    'warning: expected name after CMapName in cmap\n' +
    `2024/08/23 14:59:28 ERROR Failed to add documents error="couldn't add document '643caa0d-5dd9-4f18-8ac5-65f12247157a': couldn't create embedding of document: error response from the embedding API: 500 Internal Server Error"\n` +
    "2024/08/23 14:59:28 failed to add documents: couldn't add document '643caa0d-5dd9-4f18-8ac5-65f12247157a': couldn't create embedding of document: error response from the embedding API: 500 Internal Server Error\n",
  digest: '1456971663'
}
sangee2004 commented 3 weeks ago

When I tried to add the same knowledge file in a different thread , it succeeded.

sangee2004 commented 3 weeks ago

This error is seen intermittently even when ingesting smaller pdf file with 200 page - Insurance_Handbook_20103.pdf

I saw the file ingestion fail and retried again and it succeeded this time

Screenshot 2024-08-23 at 4 39 17 PM
cjellick commented 2 weeks ago

As mentioned in chat, id like to:

  1. verify whether this is openai's fault or not
  2. add some basic retry logic
StrongMonkey commented 1 week ago

Not super easy to reproduce as this appears to be random. I took a brief took at it and it requires some vendor hackup(cc @iwilltry42 ) to add retry logic. But the ingestion error should be showing up in UI today.

iwilltry42 commented 1 week ago

The embeddings call is in chromem-go which is using my fork anyway, so I guess I can add all that. Though a 500 is surely OpenAI's fault, some retry logic would be nice to have 👍

iwilltry42 commented 1 week ago

Retry support landed here: https://github.com/gptscript-ai/knowledge/releases/tag/v0.4.13 Waiting for https://github.com/gptscript-ai/desktop/pull/450 to be merged

sangee2004 commented 3 days ago

I have not see this issue happen for sometime now.

Will log a new issue if I encounter this errors.