mayooear / gpt4-pdf-chatbot-langchain

GPT4 & LangChain Chatbot for large PDF docs
https://www.youtube.com/watch?v=ih9PBGVVOO4
14.84k stars 3.01k forks source link

Error: Failed to ingest your data #60

Closed yaya6630170 closed 11 months ago

yaya6630170 commented 1 year ago

I extracted parts of the code except for the pdf content. Please I just can't find the bug.....

yaya6630170 commented 1 year ago

error [Error: Network Error] { config: { transitional: { silentJSONParsing: true, forcedJSONParsing: true, clarifyTimeoutError: false }, adapter: [AsyncFunction: fetchAdapter], transformRequest: [ [Function: transformRequest] ], transformResponse: [ [Function: transformResponse] ], timeout: 0, xsrfCookieName: 'XSRF-TOKEN', xsrfHeaderName: 'X-XSRF-TOKEN', maxContentLength: -1, maxBodyLength: -1, validateStatus: [Function: validateStatus], headers: { Accept: 'application/json, text/plain, /', 'Content-Type': 'application/json', 'User-Agent': 'OpenAI/NodeJS/3.2.1', Authorization: 'Bearer sk-XXXXX'(here is the openai api key) }, method: 'post',

url: 'https://api.openai.com/v1/embeddings' }, code: 'ERR_NETWORK', request: Request { [Symbol(realm)]: { settingsObject: [Object] },

  method: 'POST',
  localURLsOnly: false,
  unsafeRequest: false,
  body: [Object],
  client: [Object],
  reservedClient: null,
  replacesClientId: '',
  window: 'client',
  keepalive: false,
  serviceWorkers: 'all',
  initiator: '',
  destination: '',
  priority: null,
  origin: 'client',
  policyContainer: 'client',
  referrer: 'client',
  referrerPolicy: '',
  mode: 'cors',
  useCORSPreflightFlag: false,
  credentials: 'same-origin',
  useCredentials: false,
  cache: 'default',
  redirect: 'follow',
  integrity: '',
  cryptoGraphicsNonceMetadata: '',
  parserMetadata: '',
  reloadNavigation: false,
  historyNavigation: false,
  userActivation: false,
  taintedOrigin: false,
  redirectCount: 0,
  responseTainting: 'basic',
  preventNoCacheCacheControlHeaderModification: false,
  done: false,
  timingAllowFailed: false,
  headersList: [HeadersList],
  urlList: [Array],
  url: [URL]
},
[Symbol(signal)]: AbortSignal { aborted: false },
[Symbol(headers)]: HeadersList {
  cookies: null,
  [Symbol(headers map)]: [Map],
  [Symbol(headers map sorted)]: null
}

}, response: undefined, isAxiosError: true, toJSON: [Function: toJSON] } d:\Documents\GitHub\gpt4-pdf-chatbot-langchain\scripts\ingest-data.ts:53 throw new Error('Failed to ingest your data'); ^

[Error: Failed to ingest your data]

Node.js v18.14.2  ELIFECYCLE  Command failed with exit code 1.

text2sql commented 1 year ago

fixed it already. thanks

louis-sanna-eki commented 1 year ago

@TextToSQL how did you fix it?

text2sql commented 1 year ago

i 've spent hours 1) trying to understand where the problem is, 2) consulting with chat gpt :-) , and 3) tailoring the code here and there. i 've also updated several packages. to be very honest, i am not 100 % sure what helped exactly. as a first step, i 'd recommend you install VS studio , take a look at problems tab (it would say in which part of the code there is an issue), make sure to install all packages as per Mayo instructions. I think most important is to make sure PineCone client is up to date. I hope this helps.

text2sql commented 1 year ago

sorry, another first step is to look at the console and see if the text is actually splitting and embedding works. if yes, than the issue is likely with pinecone client

text2sql commented 1 year ago

another one i forgot to mention - make sure to change the gpt model if you don't have access to gpt4 yet. i've changed mine to gpt3.5turbo

louis-sanna-eki commented 1 year ago

@TextToSQL thx! I have a very strange "PineconeClient: Project name not set. Call init() first." error, despite the official doc no requiring it.

https://www.npmjs.com/package/@pinecone-database/pinecone

Screenshot 2023-03-26 at 16 21 02
text2sql commented 1 year ago

i thought we are initializing it somewhere. my best advice - ask chatgpt about this particular error. and also look at the 'problems' tab in VS

text2sql commented 1 year ago

did pinecone-client install ok?

louis-sanna-eki commented 1 year ago

The error is know, but none of the fix work for me (updating node, pining lib to 0.0.10)

https://github.com/pinecone-io/pinecone-ts-client/issues/12

EDIT: I managed to get a new error by adding the projectName directly on the object

Screenshot 2023-03-26 at 16 35 00

EDIT2: new error

Screenshot 2023-03-26 at 16 37 00
louis-sanna-eki commented 1 year ago

So I finally managed to make it work.

The pinecode lib use the projectName to build the url, so you have to set on the object. The projectName can be found in the pinecode web UI.

// Pinecode lib

Screenshot 2023-03-26 at 17 07 21

// Pinecode interface with name

Screenshot 2023-03-26 at 17 07 53

// Your code where you set the projectName

Screenshot 2023-03-26 at 17 07 32

I have no idea why it works for everyone else.

mayooear commented 1 year ago

another one i forgot to mention - make sure to change the gpt model if you don't have access to gpt4 yet. i've changed mine to gpt3.5turbo

This is a major cause of issues. Many people attempt to use gpt-4 when they don't yet have access.

mayooear commented 1 year ago

So I finally managed to make it work.

The pinecode lib use the projectName to build the url, but so you have to set on the object. The projectName can be found in the pinecode web UI.

// Pinecode lib Screenshot 2023-03-26 at 17 07 21 // Pinecode interface with name Screenshot 2023-03-26 at 17 07 53 // Your code where you set the projectName Screenshot 2023-03-26 at 17 07 32

I have no idea why it works for everyone else.

strange why it doesn't work for you without setting a projectname.

yaya6630170 commented 1 year ago

另一个我忘记提到的 - 如果您还没有访问 gpt4,请确保更改 gpt 模型。我已经把我的改成了 gpt3.5turbo

这是造成问题的主要原因。许多人在还没有访问权限时尝试使用 gpt-4。

已经是gpt3.5turbo了 T T文本也在分段,确实不明白是什么问题,pinecode需要修改么?我看他们上面提到了pinecone-client,是需要装这个?

felipeotarola commented 1 year ago

So I finally managed to make it work.

The pinecode lib use the projectName to build the url, so you have to set on the object. The projectName can be found in the pinecode web UI.

// Pinecode lib Screenshot 2023-03-26 at 17 07 21 // Pinecode interface with name Screenshot 2023-03-26 at 17 07 53 // Your code where you set the projectName Screenshot 2023-03-26 at 17 07 32

I have no idea why it works for everyone else.

Awesome thanks for this solution I was also experiencing this, adding the pinecone.projectName to the initPinectode function got me to the next problem, that was causing by the basePath in the pinecone library, the concat didn't work for my url so I just hardcoded and it worked.

okmike commented 1 year ago

error [Error: Network Error] { config: { transitional: { silentJSONParsing: true, forcedJSONParsing: true, clarifyTimeoutError: false }, adapter: [AsyncFunction: fetchAdapter], transformRequest: [ [Function: transformRequest] ], transformResponse: [ [Function: transformResponse] ], timeout: 0, xsrfCookieName: 'XSRF-TOKEN', xsrfHeaderName: 'X-XSRF-TOKEN', maxContentLength: -1, maxBodyLength: -1, validateStatus: [Function: validateStatus], headers: { Accept: 'application/json, text/plain, /', 'Content-Type': 'application/json', 'User-Agent': 'OpenAI/NodeJS/3.2.1', Authorization: 'Bearer sk-XXXXX'(here is the openai api key) }, method: 'post',

url: 'https://api.openai.com/v1/embeddings' }, code: 'ERR_NETWORK', request: Request { [Symbol(realm)]: { settingsObject: [Object] }, [Symbol(state)]: { method: 'POST', localURLsOnly: false, unsafeRequest: false, body: [Object], client: [Object], reservedClient: null, replacesClientId: '', window: 'client', keepalive: false, serviceWorkers: 'all', initiator: '', destination: '', priority: null, origin: 'client', policyContainer: 'client', referrer: 'client', referrerPolicy: '', mode: 'cors', useCORSPreflightFlag: false, credentials: 'same-origin', useCredentials: false, cache: 'default', redirect: 'follow', integrity: '', cryptoGraphicsNonceMetadata: '', parserMetadata: '', reloadNavigation: false, historyNavigation: false, userActivation: false, taintedOrigin: false, redirectCount: 0, responseTainting: 'basic', preventNoCacheCacheControlHeaderModification: false, done: false, timingAllowFailed: false, headersList: [HeadersList], urlList: [Array], url: [URL] }, [Symbol(signal)]: AbortSignal { aborted: false }, [Symbol(headers)]: HeadersList { cookies: null, [Symbol(headers map)]: [Map], [Symbol(headers map sorted)]: null } }, response: undefined, isAxiosError: true, toJSON: [Function: toJSON] } d:\Documents\GitHub\gpt4-pdf-chatbot-langchain\scripts\ingest-data.ts:53 throw new Error('Failed to ingest your data'); ^

[Error: Failed to ingest your data]

Node.js v18.14.2  ELIFECYCLE  Command failed with exit code 1.

Exact same problem here. Keep getting the Network Error message. But I have Clash proxy running at the same time, not sure if the Clash casues network conflicts.

okmike commented 1 year ago

I tried to change Clash proxy mode to Global、Rule、Direct multiple times, and the Network Error message keeps appearing.

yudidina commented 1 year ago

I tried to change Clash proxy mode to Global、Rule、Direct multiple times, and the Network Error message keeps appearing.

Me, too, damn

oashua commented 1 year ago

I met a similar problem about Failed to ingest your data, but the original error is fromerror [Error: Network Error] after creating vector store... image I promise the proxy setting is right(both for bash and npm)

okmike commented 1 year ago

Problem solved. For those who use Clash and VS code the same time, do the following to check the results. I am not sure which step is necessary, but it works for me anyway.

  1. Add pinecone.projectName to your file. image
  2. Turn on the Clash TUN mode. image
  3. Change Clash to Global, make sure the proxy address has access to the openai website.
  4. Final pinecone website result: image
okmike commented 1 year ago

另一个我忘记提到的 - 如果您还没有访问 gpt4,请确保更改 gpt 模型。我已经把我的改成了 gpt3.5turbo

这是造成问题的主要原因。许多人在还没有访问权限时尝试使用 gpt-4。

已经是gpt3.5turbo了 T T文本也在分段,确实不明白是什么问题,pinecode需要修改么?我看他们上面提到了pinecone-client,是需要装这个?

如果用Clash科学上网的话,可以看下我的回复

oashua commented 1 year ago

Problem solved. For those who use Clash and VS code the same time, do the following to check the results. I am not sure which step is necessary, but it works for me anyway.

  1. Add pinecone.projectName to your file. image
  2. Turn on the Clash TUN mode. image
  3. Change Clash to Global, make sure the proxy address has access to the openai website.
  4. Final pinecone website result: image

网上看到过设置tun mode的,我设置后节点列表就不见了,需要重新卸载掉service mode才可以 设置tun mode的方法也是千奇百怪。。。

yaya6630170 commented 1 year ago

另一个我忘记提到的 - 如果您还没有访问 gpt4,请确保更改 gpt 模型。我已经把我的改成了 gpt3.5turbo

这是造成问题的主要原因。许多人在还没有访问权限时尝试使用 gpt-4。

已经是gpt3.5turbo了 T T文本也在分段,确实不明白是什么问题,pinecode需要修改么?我看他们上面提到了pinecone-client,是需要装这个?

如果用Clash科学上网的话,可以看下我的回复

awsome, seems i can get to pinecone, but new problem~~~~creating vector store... error [TypeError: documents.map is not a function] d:\Documents\GitHub\gpt4-pdf-chatbot-langchain\scripts\ingest-data.ts:44 throw new Error('Failed to ingest your data'); ^

[Error: Failed to ingest your data]

zina0 commented 1 year ago

我也遇到这个问题,也有科学上网了,但是不行,显然pinecone那边也没办法读入数据,有解决的话麻烦说一下

okmike commented 1 year ago

这我就不清楚了,我是直接把TUN mode开关打开就行了。然后把UWP loopback里能选的都选上了。

okmike commented 1 year ago

另一个我忘记提到的 - 如果您还没有访问 gpt4,请确保更改 gpt 模型。我已经把我的改成了 gpt3.5turbo

这是造成问题的主要原因。许多人在还没有访问权限时尝试使用 gpt-4。

已经是gpt3.5turbo了 T T文本也在分段,确实不明白是什么问题,pinecode需要修改么?我看他们上面提到了pinecone-client,是需要装这个?

如果用Clash科学上网的话,可以看下我的回复

awsome, seems i can get to pinecone, but new problem~~~~creating vector store... error [TypeError: documents.map is not a function] d:\Documents\GitHub\gpt4-pdf-chatbot-langchain\scripts\ingest-data.ts:44 throw new Error('Failed to ingest your data'); ^

[Error: Failed to ingest your data]

试试安装pico-client?

另一个我忘记提到的 - 如果您还没有访问 gpt4,请确保更改 gpt 模型。我已经把我的改成了 gpt3.5turbo

这是造成问题的主要原因。许多人在还没有访问权限时尝试使用 gpt-4。

已经是gpt3.5turbo了 T T文本也在分段,确实不明白是什么问题,pinecode需要修改么?我看他们上面提到了pinecone-client,是需要装这个?

如果用Clash科学上网的话,可以看下我的回复

awsome, seems i can get to pinecone, but new problem~~~~creating vector store... error [TypeError: documents.map is not a function] d:\Documents\GitHub\gpt4-pdf-chatbot-langchain\scripts\ingest-data.ts:44 throw new Error('Failed to ingest your data'); ^

[Error: Failed to ingest your data]

试试安装pico-client?

GrantRomero commented 1 year ago

So I finally managed to make it work. The pinecode lib use the projectName to build the url, but so you have to set on the object. The projectName can be found in the pinecode web UI. // Pinecode lib Screenshot 2023-03-26 at 17 07 21 // Pinecode interface with name Screenshot 2023-03-26 at 17 07 53 // Your code where you set the projectName Screenshot 2023-03-26 at 17 07 32 I have no idea why it works for everyone else.

strange why it doesn't work for you without setting a projectname.

I addded the project name and am getting this error still Capture

hipnologo commented 1 year ago

I am facing similar problem; after running npm run ingest and get a successful message, I checked Pinecone index info and got a namespace "books" with a total vectors of 309. However, after initializing the app pnpm run dev and typing a question, I see the error below in the logs and in the client page an empty response.

PineconeClient: Error getting project name: TypeError: fetch failed
error - [Error: PineconeClient: Project name not set. Call init() first.] {
  page: '/api/chat'
}
wait  - compiling /_error (client and server)...

Pinecode and config files are matching README instructions.

naticio commented 1 year ago

same error for me...

image
naticio commented 1 year ago

initPinectode

thanks, I did this but now I get a new error :( Error: PineconeClient: Error calling upsertRaw: FetchError: The request failed and the interceptors did not return an alternative response]

stephanmingoes commented 1 year ago

image

Is anyone else getting status code 429 aka "too many requests"?

Dasheverless commented 1 year ago

image

Is anyone else getting status code 429 aka "too many requests"?

you can use python chatgpt api demo to check your problem. I fix this issue by setting up a payment method in my openai api page.

cklingspor commented 1 year ago

With regard to the 429. Check out here.

The reason was that I created my API key BEFORE converting my OpenAI account to paid (adding credit card). Doesn't matter if you only upgrade, you also need to create a new api key entirely. I created another API key AFTER I added my credit card and it worked fine!

This helped me as well

larri-eng commented 1 year ago

I am getting the same error. Any insights?

request: Request { [Symbol(realm)]: { settingsObject: [Object] },

  method: 'POST',
  localURLsOnly: false,
  unsafeRequest: false,
  body: [Object],
  client: [Object],
  reservedClient: null,
  replacesClientId: '',
  window: 'client',
  keepalive: false,
  serviceWorkers: 'all',
  initiator: '',
  destination: '',
  priority: null,
  origin: 'client',
  policyContainer: 'client',
  referrer: 'client',
  referrerPolicy: '',
  mode: 'cors',
  useCORSPreflightFlag: false,
  credentials: 'same-origin',
  useCredentials: false,
  cache: 'default',
  redirect: 'follow',
  integrity: '',
  cryptoGraphicsNonceMetadata: '',
  parserMetadata: '',
  reloadNavigation: false,
  historyNavigation: false,
  userActivation: false,
  taintedOrigin: false,
  redirectCount: 0,
  responseTainting: 'basic',
  preventNoCacheCacheControlHeaderModification: false,
  done: false,
  timingAllowFailed: false,
  headersList: [HeadersList],
  urlList: [Array],
  url: [URL]
},
[Symbol(signal)]: AbortSignal { aborted: false },
[Symbol(headers)]: HeadersList {
  cookies: null,
  [Symbol(headers map)]: [Map],
  [Symbol(headers map sorted)]: null
}

}, response: { ok: false, status: 401, statusText: 'Unauthorized', headers: HeadersList { cookies: null,

  [Symbol(headers map sorted)]: null
},
config: {
  transitional: [Object],
  adapter: [AsyncFunction: fetchAdapter],
  transformRequest: [Array],
  transformResponse: [Array],
  timeout: 0,
  xsrfCookieName: 'XSRF-TOKEN',
  xsrfHeaderName: 'X-XSRF-TOKEN',
  maxContentLength: -1,
  maxBodyLength: -1,
  validateStatus: [Function: validateStatus],
  headers: [Object],
  method: 'post',
dosubot[bot] commented 12 months ago

Hi, @yaya6630170! I'm Dosu, and I'm helping the gpt4-pdf-chatbot-langchain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you opened the issue titled "Error: Failed to ingest your data" because you were experiencing an error while trying to extract code from a PDF. Other users have provided suggestions and solutions, such as updating packages, installing VS Studio, and setting the projectName in the pinecone library. There was also a discussion about receiving a status code 429 (too many requests), with suggestions to check the OpenAI API key and payment method.

Before we close this issue, we wanted to check if it is still relevant to the latest version of the gpt4-pdf-chatbot-langchain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding, and please don't hesitate to reach out if you have any further questions or concerns.

dosubot[bot] commented 12 months ago

Hi, @yaya6630170! I'm Dosu, and I'm helping the gpt4-pdf-chatbot-langchain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you opened the issue titled "Error: Failed to ingest your data" because you were experiencing an error while trying to extract code from a PDF. Other users have provided suggestions and solutions, such as updating packages, installing VS Studio, and setting the projectName in the pinecone library. There was also a discussion about receiving a status code 429 (too many requests), with suggestions to check the OpenAI API key and payment method.

Before we close this issue, we wanted to check if it is still relevant to the latest version of the gpt4-pdf-chatbot-langchain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding, and please don't hesitate to reach out if you have any further questions or concerns.