zylon-ai / private-gpt

Interact with your documents using the power of GPT, 100% privately, no data leaks
https://privategpt.dev
Apache License 2.0
54.3k stars 7.3k forks source link

Error when loading a csv file #1670

Open Vivek-C-Shah opened 9 months ago

Vivek-C-Shah commented 9 months ago

Hey, first of all thanks to you for all these wonderful work, these are helping me a lot! Help me to add csv file support as it is giving some errors when trying to create embeddings of a csv file:

Generating embeddings:   0%|                                                                    | 0/1 [00:00<?, ?it/s]Traceback (most recent call last):
  File "  Path\to\project\venv\Lib\site-packages\gradio\queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "  Path\to\project\venv\Lib\site-packages\gradio\route_utils.py", line 231, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "  Path\to\project\venv\Lib\site-packages\gradio\blocks.py", line 1594, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "  Path\to\project\venv\Lib\site-packages\gradio\blocks.py", line 1176, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "  Path\to\project\venv\Lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "  Path\to\project\venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "  Path\to\project\venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "  Path\to\project\venv\Lib\site-packages\gradio\utils.py", line 689, in wrapper
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "  Path\to\project\private_gpt\ui\ui.py", line 243, in _upload_file
    self._ingest_service.bulk_ingest([(str(path.name), path) for path in paths])
  File "  Path\to\project\private_gpt\server\ingest\ingest_service.py", line 92, in bulk_ingest
    documents = self.ingest_component.bulk_ingest(files)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "  Path\to\project\private_gpt\components\ingest\ingest_component.py", line 130, in bulk_ingest
    saved_documents.extend(self._save_docs(documents))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "  Path\to\project\private_gpt\components\ingest\ingest_component.py", line 137, in _save_docs
    self._index.insert(document, show_progress=True)
  File "  Path\to\project\venv\Lib\site-packages\llama_index\indices\base.py", line 191, in insert
    nodes = run_transformations(
            ^^^^^^^^^^^^^^^^^^^^
  File "  Path\to\project\venv\Lib\site-packages\llama_index\ingestion\pipeline.py", line 70, in run_transformations
    nodes = transform(nodes, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "  Path\to\project\venv\Lib\site-packages\llama_index\embeddings\base.py", line 334, in __call__
    embeddings = self.get_text_embedding_batch(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "  Path\to\project\venv\Lib\site-packages\llama_index\embeddings\base.py", line 255, in get_text_embedding_batch
    embeddings = self._get_text_embeddings(cur_batch)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "  Path\to\project\venv\Lib\site-packages\llama_index\embeddings\huggingface.py", line 199, in _get_text_embeddings
    return self._embed(texts)
           ^^^^^^^^^^^^^^^^^^
  File "  Path\to\project\venv\Lib\site-packages\llama_index\embeddings\huggingface.py", line 158, in _embed
    model_output = self._model(**encoded_input)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "  Path\to\project\venv\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "  Path\to\project\venv\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "  Path\to\project\venv\Lib\site-packages\transformers\models\xlm_roberta\modeling_xlm_roberta.py", line 830, in forward
    embedding_output = self.embeddings(
                       ^^^^^^^^^^^^^^^^
  File "  Path\to\project\venv\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "  Path\to\project\venv\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "  Path\to\project\venv\Lib\site-packages\transformers\models\xlm_roberta\modeling_xlm_roberta.py", line 131, in forward
    position_embeddings = self.position_embeddings(position_ids)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "  Path\to\project\venv\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "  Path\to\project\venv\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "  Path\to\project\venv\Lib\site-packages\torch\nn\modules\sparse.py", line 162, in forward
    return F.embedding(
           ^^^^^^^^^^^^
  File "  Path\to\project\venv\Lib\site-packages\torch\nn\functional.py", line 2233, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index out of range in self
javierluraschi commented 8 months ago

To process CSV files you can try online hal9.com, cheers.

TobiasJu commented 8 months ago

Yeah i have the same Issue, how can this be fixed?