mayooear / gpt4-pdf-chatbot-langchain

GPT4 & LangChain Chatbot for large PDF docs
https://www.youtube.com/watch?v=ih9PBGVVOO4
14.95k stars 3.02k forks source link

Error: Failed to ingest your data #326

Closed dengweixing1996 closed 1 year ago

dengweixing1996 commented 1 year ago

I have installed the required dependencies in Pycharm IDE and configured my Pinecone and OpenAI 3.5 API key. However, when I run the command "npm run ingest", I still encounter an error. Currently, I am located in China and accessing the internet through a proxy server. I have configured the proxy server in cmd, but it still doesn't work. What could be the reason for this issue?

The following is the error message text: creating vector store... error Error: connect ETIMEDOUT 103.200.31.172:443 at node_internal_captureLargerStackTrace (node:internal/errors:490:5) at node_internal_exceptionWithHostPort (node:internal/errors:665:12) at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1571:16) { errno: -4039, code: 'ETIMEDOUT', syscall: 'connect', address: '103.200.31.172', port: 443, config: { transitional: { silentJSONParsing: true, forcedJSONParsing: true, clarifyTimeoutError: false }, adapter: [Function: httpAdapter], transformRequest: [ [Function: transformRequest] ], transformResponse: [ [Function: transformResponse] ], timeout: 0, xsrfCookieName: 'XSRF-TOKEN', xsrfHeaderName: 'X-XSRF-TOKEN', maxContentLength: -1, maxBodyLength: -1, validateStatus: [Function: validateStatus], headers: { Accept: 'application/json, text/plain, /', 'Content-Type': 'application/json', 'User-Agent': 'OpenAI/NodeJS/3.2.1', Authorization: 'Bearer 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', 'Content-Length': 11054 }, method: 'post',

// data: '{"model":"text-embedding-ada-002","input":............................

url: 'https://api.openai.com/v1/embeddings' }, request: <ref *1> Writable { _writableState: WritableState { objectMode: false, highWaterMark: 16384, finalCalled: false, needDrain: false, ending: false, ended: false, finished: false, destroyed: false, decodeStrings: true, defaultEncoding: 'utf8', length: 0, writing: false, corked: 0, sync: true, bufferProcessing: false, onwrite: [Function: bound onwrite], writecb: null, writelen: 0, afterWriteTickInfo: null, buffered: [], bufferedIndex: 0, allBuffers: true, allNoop: true, pendingcb: 0, constructed: true, prefinished: false, errorEmitted: false, emitClose: true, autoDestroy: true, errored: null, closed: false, closeEmitted: false,

},
_events: [Object: null prototype] {
  response: [Function: handleResponse],
  error: [Function: handleRequestError],
  socket: [Function: handleRequestSocket]
},
_eventsCount: 3,
_maxListeners: undefined,
_options: {
  maxRedirects: 21,
  maxBodyLength: 10485760,
  protocol: 'https:',
  path: '/v1/embeddings',
  method: 'POST',
  headers: [Object],
  agent: undefined,
  agents: [Object],
  auth: undefined,
  hostname: 'api.openai.com',
  port: null,
  nativeProtocols: [Object],
  pathname: '/v1/embeddings'
},
_ended: true,
_ending: true,
_redirectCount: 0,
_redirects: [],
_requestBodyLength: 11054,
_requestBodyBuffers: [ [Object] ],
_onNativeResponse: [Function (anonymous)],
_currentRequest: ClientRequest {
  _events: [Object: null prototype],
  _eventsCount: 7,
  _maxListeners: undefined,
  outputData: [],
  outputSize: 0,
  writable: true,
  destroyed: true,
  _last: false,
  chunkedEncoding: false,
  shouldKeepAlive: true,
  maxRequestsOnConnectionReached: false,
  _defaultKeepAlive: true,
  useChunkedEncodingByDefault: true,
  sendDate: false,
  _removedConnection: false,
  _removedContLen: false,
  _removedTE: false,
  strictContentLength: false,
  _contentLength: 11054,
  _hasBody: true,
  _trailer: '',
  finished: true,
  _headerSent: true,
  _closed: true,
  socket: [TLSSocket],
  _header: 'POST /v1/embeddings HTTP/1.1\r\n' +
    'Accept: application/json, text/plain, */*\r\n' +
    'Content-Type: application/json\r\n' +
    'User-Agent: OpenAI/NodeJS/3.2.1\r\n' +
    'Authorization: Bearer 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\r\n' +
    'Content-Length: 11054\r\n' +
    'Host: api.openai.com\r\n' +
    'Connection: keep-alive\r\n' +
    '\r\n',
  _keepAliveTimeout: 0,
  _onPendingData: [Function: nop],
  agent: [Agent],
  socketPath: undefined,
  method: 'POST',
  maxHeaderSize: undefined,
  insecureHTTPParser: undefined,
  joinDuplicateHeaders: undefined,
  path: '/v1/embeddings',
  _ended: false,
  res: null,
  aborted: false,
  timeoutCb: [Function: emitRequestTimeout],
  upgradeOrConnect: false,
  parser: null,
  maxHeadersCount: null,
  reusedSocket: false,
  host: 'api.openai.com',
  protocol: 'https:',
  _redirectable: [Circular *1],
  [Symbol(kCapture)]: false,
  [Symbol(kBytesWritten)]: 0,
  [Symbol(kNeedDrain)]: false,
  [Symbol(corked)]: 0,
  [Symbol(kOutHeaders)]: [Object: null prototype],
  [Symbol(errored)]: null,
  [Symbol(kHighWaterMark)]: 16384,
  [Symbol(kRejectNonStandardBodyWrites)]: false,
  [Symbol(kUniqueHeaders)]: null
},
_currentUrl: 'https://api.openai.com/v1/embeddings',
[Symbol(kCapture)]: false

}, response: undefined, isAxiosError: true, toJSON: [Function: toJSON], attemptNumber: 5, retriesLeft: 2 } c:\Users\smith\PycharmProjects\gpt4-pdf-chatbot-langchain\scripts\ingest-data.ts:47 throw new Error('Failed to ingest your data'); ^

Error: Failed to ingest your data at run (c:\Users\smith\PycharmProjects\gpt4-pdf-chatbot-langchain\scripts\ingest-data.ts:47:11) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at (c:\Users\smith\PycharmProjects\gpt4-pdf-chatbot-langchain\scripts\ingest-data.ts:52:3)

snowsky commented 1 year ago

Better to renew your OpenAI key :) Cannot reproduce since Pinecorn doesn't have a free tier now....

YongjiangXu commented 1 year ago

我通过另一个地方的开源程序成功提取了pdf,但是真正运行的时候又报出这个ETIMEDOUT 错误。提取pdf见:https://github.com/ucl98/pinecone_ingest_python_implementation

dosubot[bot] commented 1 year ago

Hi, @dengweixing1996! I'm Dosu, and I'm here to help the gpt4-pdf-chatbot-langchain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

Based on my understanding, you are experiencing a connection timeout issue with the OpenAI API when running "npm run ingest" in Pycharm IDE. It seems that you suspect this issue may be related to accessing the internet through a proxy server in China. snowsky suggested renewing the OpenAI key, while YongjiangXu shared a successful experience with a different program but encountered the same ETIMEDOUT error.

Before we proceed, we would like to confirm if this issue is still relevant to the latest version of the gpt4-pdf-chatbot-langchain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you for your understanding and cooperation. We look forward to hearing from you soon.