[Question]: May I know why there is BadRequestError:400 while I run "npm run generate"

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

I get this output when I run npm run generate :

BadRequestError: 400 This model's maximum context length is 8192 tokens, however you requested 23869 tokens (23869 in your prompt; 0 for the completion). Please reduce your prompt; or completion length. at APIError.generate (file:///C:/chat-llama/chat-llamaindex/node_modules/.pnpm/openai@4.37.1_encoding@0.1.13/node_modules/openai/error.mjs:41:20) at OpenAI.makeStatusError (file:///C:/chat-llama/chat-llamaindex/node_modules/.pnpm/openai@4.37.1_encoding@0.1.13/node_modules/openai/core.mjs:256:25) at OpenAI.makeRequest (file:///C:/chat-llama/chat-llamaindex/node_modules/.pnpm/openai@4.37.1_encoding@0.1.13/node_modules/openai/core.mjs:299:30) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async OpenAIEmbedding.getOpenAIEmbedding (file:///C:/chat-llama/chat-llamaindex/node_modules/.pnpm/llamaindex@0.1.18_@google+generative-ai@0.1.3_encoding@0.1.13_typescript@5.1.6/node_modules/llamaindex/dist/embeddings/OpenAIEmbedding.js:82:26) at async OpenAIEmbedding.getTextEmbeddings (file:///C:/chat-llama/chat-llamaindex/node_modules/.pnpm/llamaindex@0.1.18_@google+generative-ai@0.1.3_encoding@0.1.13_typescript@5.1.6/node_modules/llamaindex/dist/embeddings/OpenAIEmbedding.js:93:16) at async OpenAIEmbedding.getTextEmbeddingsBatch (file:///C:/chat-llama/chat-llamaindex/node_modules/.pnpm/llamaindex@0.1.18_@google+generative-ai@0.1.3_encoding@0.1.13_typescript@5.1.6/node_modules/llamaindex/dist/embeddings/types.js:32:36) at async VectorStoreIndex.getNodeEmbeddingResults (file:///C:/chat-llama/chat-llamaindex/node_modules/.pnpm/llamaindex@0.1.18_@google+generative-ai@0.1.3_encoding@0.1.13_typescript@5.1.6/node_modules/llamaindex/dist/indices/vectorStore/index.js:89:28) at async VectorStoreIndex.insertNodes (file:///C:/chat-llama/chat-llamaindex/node_modules/.pnpm/llamaindex@0.1.18_@google+generative-ai@0.1.3_encoding@0.1.13_typescript@5.1.6/node_modules/llamaindex/dist/indices/vectorStore/index.js:189:34) at async VectorStoreIndex.buildIndexFromNodes (file:///C:/chat-llama/chat-llamaindex/node_modules/.pnpm/llamaindex@0.1.18_@google+generative-ai@0.1.3_encoding@0.1.13_typescript@5.1.6/node_modules/llamaindex/dist/indices/vectorStore/index.js:109:9) at async VectorStoreIndex.init (file:///C:/chat-llama/chat-llamaindex/node_modules/.pnpm/llamaindex@0.1.18_@google+generative-ai@0.1.3_encoding@0.1.13_typescript@5.1.6/node_modules/llamaindex/dist/indices/vectorStore/index.js:55:13) at async VectorStoreIndex.fromDocuments (file:///C:/chat-llama/chat-llamaindex/node_modules/.pnpm/llamaindex@0.1.18_@google+generative-ai@0.1.3_encoding@0.1.13_typescript@5.1.6/node_modules/llamaindex/dist/indices/vectorStore/index.js:132:16) at async file:///C:/chat-llama/chat-llamaindex/scripts/generate.mjs:37:5 at async getRuntime (file:///C:/chat-llama/chat-llamaindex/scripts/generate.mjs:22:3) at async generateDatasource (file:///C:/chat-llama/chat-llamaindex/scripts/generate.mjs:30:14) at async file:///C:/chat-llama/chat-llamaindex/scripts/generate.mjs:86:3

To address the BadRequestError: 400 due to exceeding the OpenAI model's maximum context length of 8192 tokens with a request of 23869 tokens, consider the following steps:

Switch to a Model with a Higher Token Limit: Use a model that supports a higher token limit, such as gpt-4-32k or a gpt-4-turbo model, which can handle up to 128000 tokens. This adjustment is crucial if your application's requirements exceed the standard token limits.
Adjust Your Code for the New Model: In your project, specify the new model name in your API call. For example, if using Python with the OpenAI API, your code snippet would look like this:

import openai

openai.api_key = 'your-api-key-here'

response = openai.Completion.create(
  engine="gpt-4-32k", # Model adjusted for higher token limit
  prompt="Your prompt here",
  max_tokens=23869
)

Make sure to replace "gpt-4-32k" with the actual model name you're using, and fill in your OpenAI API key and prompt accordingly.

This approach should help you bypass the token limit issue by leveraging a model that accommodates your token requirements.

#### Details

**Sources** - [llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/utils.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/utils.py) - [llama-index-legacy/llama_index/legacy/llms/openai.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/llms/openai.py) - [llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/base.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

run-llama / llama_index

[Question]: May I know why there is BadRequestError:400 while I run "npm run generate" #13174

Question Validation

Question