ZeroDivisionError: Weights sum to zero, can't be normalized

First off, thanks for taking the time to post this package. I am getting this error when asking a question after I uploaded the PDF.

Using embedded DuckDB without persistence: data will be transient
Traceback (most recent call last):
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/gradio/routes.py", line 401, in run_predict
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/gradio/blocks.py", line 1302, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/gradio/blocks.py", line 1039, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/gradio/utils.py", line 491, in async_iteration
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/app.py", line 80, in get_response
    chain = app(file)
            ^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/app.py", line 46, in __call__
    self.chain = self.build_chain(file)
                 ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/app.py", line 69, in build_chain
    pdfsearch = Chroma.from_documents(documents, embeddings, collection_name= file_name,)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/langchain/vectorstores/chroma.py", line 347, in from_documents
    return cls.from_texts(
           ^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/langchain/vectorstores/chroma.py", line 315, in from_texts
    chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids)
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/langchain/vectorstores/chroma.py", line 121, in add_texts
    embeddings = self._embedding_function.embed_documents(list(texts))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/langchain/embeddings/openai.py", line 228, in embed_documents
    return self._get_len_safe_embeddings(texts, engine=self.deployment)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/langchain/embeddings/openai.py", line 189, in _get_len_safe_embeddings
    average = np.average(results[i], axis=0, weights=lens[i])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/numpy/lib/function_base.py", line 550, in average
    raise ZeroDivisionError(
ZeroDivisionError: Weights sum to zero, can't be normalized

Hello Sunil Kumar ji,

Thanks for this excellent git repo. While testing your code I am getting below error, what can be the possible reason--

Using embedded DuckDB without persistence: data will be transient Traceback (most recent call last): File "/home/rtx/akm/lib/python3.8/site-packages/gradio/routes.py", line 401, in run_predict output = await app.get_blocks().process_api( File "/home/rtx/akm/lib/python3.8/site-packages/gradio/blocks.py", line 1302, in process_api result = await self.call_function( File "/home/rtx/akm/lib/python3.8/site-packages/gradio/blocks.py", line 1039, in call_function prediction = await anyio.to_thread.run_sync( File "/home/rtx/akm/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/rtx/akm/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/home/rtx/akm/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "/home/rtx/akm/lib/python3.8/site-packages/gradio/utils.py", line 491, in async_iteration return next(iterator) File "/tmp/ipykernel_29949/1995911808.py", line 85, in get_response chain = app(file) File "/tmp/ipykernel_29949/1995911808.py", line 44, in call self.chain = self.build_chain(file) File "/tmp/ipykernel_29949/1995911808.py", line 74, in build_chain pdfsearch = Chroma.from_documents(documents, embeddings, collection_name= file_name,) File "/home/rtx/akm/lib/python3.8/site-packages/langchain/vectorstores/chroma.py", line 613, in from_documents return cls.from_texts( File "/home/rtx/akm/lib/python3.8/site-packages/langchain/vectorstores/chroma.py", line 568, in from_texts chroma_collection = cls( File "/home/rtx/akm/lib/python3.8/site-packages/langchain/vectorstores/chroma.py", line 126, in init self._collection = self._client.get_or_create_collection( File "/home/rtx/akm/lib/python3.8/site-packages/chromadb/api/local.py", line 79, in get_or_create_collection return self.create_collection(name, metadata, embedding_function, get_or_create=True) File "/home/rtx/akm/lib/python3.8/site-packages/chromadb/api/local.py", line 66, in create_collection check_index_name(name) File "/home/rtx/akm/lib/python3.8/site-packages/chromadb/api/local.py", line 41, in check_index_name raise ValueError(msg) ValueError: Expected collection name that (1) contains 3-63 characters, (2) starts and ends with an alphanumeric character, (3) otherwise contains only alphanumeric characters, underscores or hyphens (-), (4) contains no two consecutive periods (..) and (5) is not a valid IPv4 address

Kindly note that OpenAI API key has been considered while running the code. Also, the number of characters in the file name is only 10.

sunilkumardash9 / Pdf-GPT

ZeroDivisionError: Weights sum to zero, can't be normalized #3

Thanks for this excellent git repo. While testing your code I am getting below error, what can be the possible reason--