sunilkumardash9 / Pdf-GPT

A Gradio app for chatting with PDFs
MIT License
49 stars 20 forks source link

ZeroDivisionError: Weights sum to zero, can't be normalized #3

Open mrwadepro opened 1 year ago

mrwadepro commented 1 year ago

First off, thanks for taking the time to post this package. I am getting this error when asking a question after I uploaded the PDF.

Using embedded DuckDB without persistence: data will be transient
Traceback (most recent call last):
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/gradio/routes.py", line 401, in run_predict
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/gradio/blocks.py", line 1302, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/gradio/blocks.py", line 1039, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/gradio/utils.py", line 491, in async_iteration
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/app.py", line 80, in get_response
    chain = app(file)
            ^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/app.py", line 46, in __call__
    self.chain = self.build_chain(file)
                 ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/app.py", line 69, in build_chain
    pdfsearch = Chroma.from_documents(documents, embeddings, collection_name= file_name,)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/langchain/vectorstores/chroma.py", line 347, in from_documents
    return cls.from_texts(
           ^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/langchain/vectorstores/chroma.py", line 315, in from_texts
    chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids)
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/langchain/vectorstores/chroma.py", line 121, in add_texts
    embeddings = self._embedding_function.embed_documents(list(texts))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/langchain/embeddings/openai.py", line 228, in embed_documents
    return self._get_len_safe_embeddings(texts, engine=self.deployment)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/langchain/embeddings/openai.py", line 189, in _get_len_safe_embeddings
    average = np.average(results[i], axis=0, weights=lens[i])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/numpy/lib/function_base.py", line 550, in average
    raise ZeroDivisionError(
ZeroDivisionError: Weights sum to zero, can't be normalized
akmcax commented 1 year ago

Hello Sunil Kumar ji,

Thanks for this excellent git repo. While testing your code I am getting below error, what can be the possible reason--

Using embedded DuckDB without persistence: data will be transient Traceback (most recent call last): File "/home/rtx/akm/lib/python3.8/site-packages/gradio/routes.py", line 401, in run_predict output = await app.get_blocks().process_api( File "/home/rtx/akm/lib/python3.8/site-packages/gradio/blocks.py", line 1302, in process_api result = await self.call_function( File "/home/rtx/akm/lib/python3.8/site-packages/gradio/blocks.py", line 1039, in call_function prediction = await anyio.to_thread.run_sync( File "/home/rtx/akm/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/rtx/akm/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/home/rtx/akm/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "/home/rtx/akm/lib/python3.8/site-packages/gradio/utils.py", line 491, in async_iteration return next(iterator) File "/tmp/ipykernel_29949/1995911808.py", line 85, in get_response chain = app(file) File "/tmp/ipykernel_29949/1995911808.py", line 44, in call self.chain = self.build_chain(file) File "/tmp/ipykernel_29949/1995911808.py", line 74, in build_chain pdfsearch = Chroma.from_documents(documents, embeddings, collection_name= file_name,) File "/home/rtx/akm/lib/python3.8/site-packages/langchain/vectorstores/chroma.py", line 613, in from_documents return cls.from_texts( File "/home/rtx/akm/lib/python3.8/site-packages/langchain/vectorstores/chroma.py", line 568, in from_texts chroma_collection = cls( File "/home/rtx/akm/lib/python3.8/site-packages/langchain/vectorstores/chroma.py", line 126, in init self._collection = self._client.get_or_create_collection( File "/home/rtx/akm/lib/python3.8/site-packages/chromadb/api/local.py", line 79, in get_or_create_collection return self.create_collection(name, metadata, embedding_function, get_or_create=True) File "/home/rtx/akm/lib/python3.8/site-packages/chromadb/api/local.py", line 66, in create_collection check_index_name(name) File "/home/rtx/akm/lib/python3.8/site-packages/chromadb/api/local.py", line 41, in check_index_name raise ValueError(msg) ValueError: Expected collection name that (1) contains 3-63 characters, (2) starts and ends with an alphanumeric character, (3) otherwise contains only alphanumeric characters, underscores or hyphens (-), (4) contains no two consecutive periods (..) and (5) is not a valid IPv4 address

Kindly note that OpenAI API key has been considered while running the code. Also, the number of characters in the file name is only 10.

sunilkumardash9 commented 1 year ago

hi @akmcax, it probably has to do with the name of the Chroma collection. Check if it complies with the naming convention. Your collection name might have an underscore or hyphen at the end.