Closed zacherylzy closed 1 year ago
Hi
I ran into the same error right now. I'm trying to train the model on one pdf file and I'm getting this error.
Traceback (most recent call last):
File "/Users/aadivyaraushan/Documents/GitHub/chat-icse/privateGPT-main/ingest.py", line 167, in <module>
main()
File "/Users/aadivyaraushan/Documents/GitHub/chat-icse/privateGPT-main/ingest.py", line 153, in main
db.add_documents(texts)
File "/Users/aadivyaraushan/.pyenv/versions/3.11.3/lib/python3.11/site-packages/langchain/vectorstores/base.py", line 62, in add_documents
return self.add_texts(texts, metadatas, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/aadivyaraushan/.pyenv/versions/3.11.3/lib/python3.11/site-packages/langchain/vectorstores/chroma.py", line 160, in add_texts
self._collection.add(
File "/Users/aadivyaraushan/.pyenv/versions/3.11.3/lib/python3.11/site-packages/chromadb/api/models/Collection.py", line 101, in add
ids, embeddings, metadatas, documents = self._validate_embedding_set(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/aadivyaraushan/.pyenv/versions/3.11.3/lib/python3.11/site-packages/chromadb/api/models/Collection.py", line 348, in _validate_embedding_set
ids = validate_ids(maybe_cast_one_to_many(ids))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/aadivyaraushan/.pyenv/versions/3.11.3/lib/python3.11/site-packages/chromadb/api/types.py", line 77, in maybe_cast_one_to_many
if isinstance(target[0], (int, float)):
~~~~~~^^^
IndexError: list index out of range
I got the same error, running on Ubuntu Server 22.04.2 LTS.
vini@linux:~/privateGPT$ python ingest.py
Appending to existing vectorstore at db
Using embedded DuckDB with persistence: data will be stored in: db
Loading documents from source_documents
Loading new documents: 100%|██████████████████████| 1/1 [00:00<00:00, 1.89it/s]
Loaded 1 new documents from source_documents
Split into 0 chunks of text (max. 500 tokens each)
Creating embeddings. May take some minutes...
Traceback (most recent call last):
File "/home/vini/privateGPT/ingest.py", line 166, in
Why was this closed? I'm getting the same issue due to .docxc file.
My solution was that I did not cd into the right folder. It was my mistake, not privategpt
Can you specify your solution? I got the same problem and have no idea. Thanks
Why was this issue closed. No solution.
Same problem here On Windows 11 python 3.11.4
It doesn't make sense to close the issue though.
(.venv) C:\Projects\privateGPT>python ingest.py source_documents\acunetix.pdf
Appending to existing vectorstore at db
Using embedded DuckDB with persistence: data will be stored in: db
Unable to connect optimized C data functions [No module named '_testbuffer'], falling back to pure Python
Loading documents from source_documents
Loading new documents: 100%|██████████████████████| 2/2 [00:04<00:00, 2.43s/it]
Loaded 11 new documents from source_documents
Split into 0 chunks of text (max. 500 tokens each)
Creating embeddings. May take some minutes...
Traceback (most recent call last):
File "C:\Projects\privateGPT\ingest.py", line 166, in
IndexError: list index out of range
guys this error is very weird,
i cant tell if this is an solution, but it let my code run:
im using pycharm and the way is to create a new project, build the same environment and paste your codes there, this solved my problem and let it worked for 3 days before the same problem occured
i searched on google and found no exact solutions, so i would do the same thing again
Hi. Running Mac OS Monterey 12.0.1 and Python3.11 in Terminal. Downloaded all the latest files.
Been encountering "list index out of range" regardless of what I try, no idea what the issue is and I've only seen one other person post about it here.
Loading documents from source_documents Loaded 0 documents from source_documents Split into 0 chunks of text (max. 500 tokens each) llama.cpp: loading model from /Users/*/Desktop/**/privateGPT-main.nosync/models/ggml-model-q4_0.bin llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this llama_model_load_internal: format = 'ggml' (old version with low tokenizer quality and no mmap support) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 10000 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 2 (mostly Q4_0) llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 7B llama_model_load_internal: ggml ctx size = 4113748.20 KB llama_model_load_internal: mem required = 5809.33 MB (+ 2052.00 MB per state) ................................................................................................... . llama_init_from_file: kv self size = 10000.00 MB AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | Using embedded DuckDB with persistence: data will be stored in: db Traceback (most recent call last): File "/Users/**/Desktop/*/privateGPT-main.nosync/ingest.py", line 96, in
main()
File "/Users/**/Desktop/*****/privateGPT-main.nosync/ingest.py", line 90, in main
db = Chroma.from_documents(texts, llama, persist_directory=persist_directory, client_settings=CHROMA_SETTINGS)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain/vectorstores/chroma.py", line 413, in from_documents
return cls.from_texts(
^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain/vectorstores/chroma.py", line 381, in from_texts
chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain/vectorstores/chroma.py", line 159, in add_texts
self._collection.add(
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/chromadb/api/models/Collection.py", line 97, in add
ids, embeddings, metadatas, documents = self._validate_embedding_set(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/chromadb/api/models/Collection.py", line 340, in _validate_embedding_set
ids = validate_ids(maybe_cast_one_to_many(ids))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/chromadb/api/types.py", line 75, in maybe_cast_one_to_many
if isinstance(target[0], (int, float)):