zylon-ai / private-gpt

Interact with your documents using the power of GPT, 100% privately, no data leaks
https://privategpt.dev
Apache License 2.0
54.11k stars 7.28k forks source link

Ingest.py list index out of range: #259

Closed zacherylzy closed 1 year ago

zacherylzy commented 1 year ago

Hi. Running Mac OS Monterey 12.0.1 and Python3.11 in Terminal. Downloaded all the latest files.

Been encountering "list index out of range" regardless of what I try, no idea what the issue is and I've only seen one other person post about it here.

Loading documents from source_documents Loaded 0 documents from source_documents Split into 0 chunks of text (max. 500 tokens each) llama.cpp: loading model from /Users/*/Desktop/**/privateGPT-main.nosync/models/ggml-model-q4_0.bin llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this llama_model_load_internal: format = 'ggml' (old version with low tokenizer quality and no mmap support) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 10000 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 2 (mostly Q4_0) llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 7B llama_model_load_internal: ggml ctx size = 4113748.20 KB llama_model_load_internal: mem required = 5809.33 MB (+ 2052.00 MB per state) ................................................................................................... . llama_init_from_file: kv self size = 10000.00 MB AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | Using embedded DuckDB with persistence: data will be stored in: db Traceback (most recent call last): File "/Users/**/Desktop/*/privateGPT-main.nosync/ingest.py", line 96, in main() File "/Users/**/Desktop/*****/privateGPT-main.nosync/ingest.py", line 90, in main db = Chroma.from_documents(texts, llama, persist_directory=persist_directory, client_settings=CHROMA_SETTINGS) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain/vectorstores/chroma.py", line 413, in from_documents return cls.from_texts( ^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain/vectorstores/chroma.py", line 381, in from_texts chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain/vectorstores/chroma.py", line 159, in add_texts self._collection.add( File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/chromadb/api/models/Collection.py", line 97, in add ids, embeddings, metadatas, documents = self._validate_embedding_set( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/chromadb/api/models/Collection.py", line 340, in _validate_embedding_set ids = validate_ids(maybe_cast_one_to_many(ids)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/chromadb/api/types.py", line 75, in maybe_cast_one_to_many if isinstance(target[0], (int, float)):


IndexError: list index out of range
aadivyaraushan commented 1 year ago

Hi

I ran into the same error right now. I'm trying to train the model on one pdf file and I'm getting this error.

My output logs:

Traceback (most recent call last):
  File "/Users/aadivyaraushan/Documents/GitHub/chat-icse/privateGPT-main/ingest.py", line 167, in <module>
    main()
  File "/Users/aadivyaraushan/Documents/GitHub/chat-icse/privateGPT-main/ingest.py", line 153, in main
    db.add_documents(texts)
  File "/Users/aadivyaraushan/.pyenv/versions/3.11.3/lib/python3.11/site-packages/langchain/vectorstores/base.py", line 62, in add_documents
    return self.add_texts(texts, metadatas, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/aadivyaraushan/.pyenv/versions/3.11.3/lib/python3.11/site-packages/langchain/vectorstores/chroma.py", line 160, in add_texts
    self._collection.add(
  File "/Users/aadivyaraushan/.pyenv/versions/3.11.3/lib/python3.11/site-packages/chromadb/api/models/Collection.py", line 101, in add
    ids, embeddings, metadatas, documents = self._validate_embedding_set(
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/aadivyaraushan/.pyenv/versions/3.11.3/lib/python3.11/site-packages/chromadb/api/models/Collection.py", line 348, in _validate_embedding_set
    ids = validate_ids(maybe_cast_one_to_many(ids))
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/aadivyaraushan/.pyenv/versions/3.11.3/lib/python3.11/site-packages/chromadb/api/types.py", line 77, in maybe_cast_one_to_many
    if isinstance(target[0], (int, float)):
                  ~~~~~~^^^
IndexError: list index out of range
viniciusbuscacio commented 1 year ago

I got the same error, running on Ubuntu Server 22.04.2 LTS.

vini@linux:~/privateGPT$ python ingest.py Appending to existing vectorstore at db Using embedded DuckDB with persistence: data will be stored in: db Loading documents from source_documents Loading new documents: 100%|██████████████████████| 1/1 [00:00<00:00, 1.89it/s] Loaded 1 new documents from source_documents Split into 0 chunks of text (max. 500 tokens each) Creating embeddings. May take some minutes... Traceback (most recent call last): File "/home/vini/privateGPT/ingest.py", line 166, in main() File "/home/vini/privateGPT/ingest.py", line 152, in main db.add_documents(texts) File "/home/vini/.local/lib/python3.10/site-packages/langchain/vectorstores/base.py", line 62, in add_documents return self.add_texts(texts, metadatas, **kwargs) File "/home/vini/.local/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 160, in add_texts self._collection.add( File "/home/vini/.local/lib/python3.10/site-packages/chromadb/api/models/Collection.py", line 101, in add ids, embeddings, metadatas, documents = self._validate_embedding_set( File "/home/vini/.local/lib/python3.10/site-packages/chromadb/api/models/Collection.py", line 348, in _validate_embedding_set ids = validate_ids(maybe_cast_one_to_many(ids)) File "/home/vini/.local/lib/python3.10/site-packages/chromadb/api/types.py", line 77, in maybe_cast_one_to_many if isinstance(target[0], (int, float)): IndexError: list index out of range

Exagram commented 1 year ago

Why was this closed? I'm getting the same issue due to .docxc file.

zacherylzy commented 1 year ago

My solution was that I did not cd into the right folder. It was my mistake, not privategpt

azizanhakim commented 1 year ago

Can you specify your solution? I got the same problem and have no idea. Thanks

Exagram commented 1 year ago

Why was this issue closed. No solution.

nocturneatfiftyhz commented 1 year ago

Same problem here On Windows 11 python 3.11.4

It doesn't make sense to close the issue though.

(.venv) C:\Projects\privateGPT>python ingest.py source_documents\acunetix.pdf Appending to existing vectorstore at db Using embedded DuckDB with persistence: data will be stored in: db Unable to connect optimized C data functions [No module named '_testbuffer'], falling back to pure Python Loading documents from source_documents Loading new documents: 100%|██████████████████████| 2/2 [00:04<00:00, 2.43s/it] Loaded 11 new documents from source_documents Split into 0 chunks of text (max. 500 tokens each) Creating embeddings. May take some minutes... Traceback (most recent call last): File "C:\Projects\privateGPT\ingest.py", line 166, in main() File "C:\Projects\privateGPT\ingest.py", line 152, in main db.add_documents(texts) File "c:\Projects\privateGPT.venv\Lib\site-packages\langchain\vectorstores\base.py", line 72, in add_documents return self.add_texts(texts, metadatas, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Projects\privateGPT.venv\Lib\site-packages\langchain\vectorstores\chroma.py", line 160, in add_texts self._collection.add( File "c:\Projects\privateGPT.venv\Lib\site-packages\chromadb\api\models\Collection.py", line 101, in add ids, embeddings, metadatas, documents = self._validate_embedding_set( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Projects\privateGPT.venv\Lib\site-packages\chromadb\api\models\Collection.py", line 348, in _validate_embedding_set ids = validate_ids(maybe_cast_one_to_many(ids)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Projects\privateGPT.venv\Lib\site-packages\chromadb\api\types.py", line 77, in maybe_cast_one_to_many if isinstance(target[0], (int, float)):


IndexError: list index out of range
MengyuanSu commented 1 year ago

guys this error is very weird,

i cant tell if this is an solution, but it let my code run:

im using pycharm and the way is to create a new project, build the same environment and paste your codes there, this solved my problem and let it worked for 3 days before the same problem occured

i searched on google and found no exact solutions, so i would do the same thing again