Closed khaoss85 closed 1 year ago
Hi @khaoss85 !
Did you change anything in the create_embeddings.py
?
It's normally set to chunk the content into pieces of 1500 tokens, so this should not happen.
It looks like you might not do the splitting correctly?
@mpaepper I get the same error, without modifying anything in the create_embeddings.py
python3 create_embeddings.py -s https://www.phonepe.com/sitemap.xml -f https://www.phonepe.com/
The culprit is your page https://www.phonepe.com/press
which gets parsed to a long text without the possibility to chunk it up (using linebreak "\n" for that in the code).
So either you need to look into how to handle that particular page and why it fails or exclude it in the create_embeddings.py
, for example:
for info in raw['urlset']['url']:
url = info['loc']
if args.filter in url and 'https://www.phonepe.com/press' not in url:
pages.append({'text': extract_text_from(url), 'source': url})
whoa, thanks for the lightning reply. that makes sense.
Traceback (most recent call last): File "/Users/pelleri/Desktop/dahu/content-chatbot-main/create_embeddings.py", line 49, in
store = FAISS.from_texts(docs, OpenAIEmbeddings(), metadatas=metadatas)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/langchain/vectorstores/faiss.py", line 250, in from_texts
embeddings = embedding.embed_documents(texts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/langchain/embeddings/openai.py", line 254, in embed_documents
response = embed_with_retry(
^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/langchain/embeddings/openai.py", line 53, in embed_with_retry
return _completion_with_retry(kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/tenacity/init.py", line 289, in wrapped_f
return self(f, *args, *kw)
^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/tenacity/init.py", line 379, in call
do = self.iter(retry_state=retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/tenacity/init.py", line 314, in iter
return fut.result()
^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result
raise self._exception
File "/opt/homebrew/lib/python3.11/site-packages/tenacity/init.py", line 382, in call__
result = fn(args, kwargs)
^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/langchain/embeddings/openai.py", line 51, in _completion_with_retry
return embeddings.client.create(*kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/openai/api_resources/embedding.py", line 33, in create
response = super().create(args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/openai/api_resources/abstract/engine_apiresource.py", line 153, in create
response, , api_key = requestor.request(
^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/openai/api_requestor.py", line 226, in request
resp, got_stream = self._interpret_response(result, stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/openai/api_requestor.py", line 619, in _interpret_response
self._interpret_response_line(
File "/opt/homebrew/lib/python3.11/site-packages/openai/api_requestor.py", line 679, in _interpret_response_line
raise self.handle_error_response(
openai.error.InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 8951 tokens (8951 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.
pelleri-macpro14:content-chatbot-main pelleri$