Closed PierrickLozach closed 1 year ago
Try replacing the csv-loader in the LOADER_MAPPING in ingest.py by:
".csv": (CSVLoader, {"csv_args": {"delimiter": ";"}})
No change unfortunately. Still getting the same error.
Updated code:
LOADER_MAPPING = {
".csv": (CSVLoader, {"csv_args": {"delimiter": ";"}}),
Sample csv (I modified some of the content to remove anything sensitive):
question;answer
"Confirm that user privileges are/can be reviewed for toxic combinations";"Customers control user access, roles and permissions within the \nCloud CX application. The platform will display roles that any user have access\nto and all the permissions for a user can be viewed from the user\nprofile. User permissions are controlled by the roles that are\nassigned. Full detail here: https://link-here"
"Do we use any external cyber intelligence service to gather intelligence on latest vulnerabilities?";"We do use intelligence services and teams are in various industry standard groups where threat knowledge is shared. We do however not publish details on these."
"How and when are call recordings decrypted.";"When an authenticated and authorised request for the replay or download of a recoridng is recieved the recording file is copied to temporary storage and decrypted on demand and made available."
error:
Creating new vectorstore
Loading documents from source_documents
Loading new documents: 0%| | 0/1 [00:01<?, ?it/s]
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/pierrick.lozach/anaconda3/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/Users/pierrick.lozach/Documents/privateGPT/ingest.py", line 89, in load_single_document
return loader.load()[0]
File "/Users/pierrick.lozach/anaconda3/lib/python3.10/site-packages/langchain/document_loaders/csv_loader.py", line 48, in load
content = "\n".join(f"{k.strip()}: {v.strip()}" for k, v in row.items())
File "/Users/pierrick.lozach/anaconda3/lib/python3.10/site-packages/langchain/document_loaders/csv_loader.py", line 48, in <genexpr>
content = "\n".join(f"{k.strip()}: {v.strip()}" for k, v in row.items())
AttributeError: 'NoneType' object has no attribute 'strip'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/pierrick.lozach/Documents/privateGPT/ingest.py", line 178, in <module>
main()
File "/Users/pierrick.lozach/Documents/privateGPT/ingest.py", line 167, in main
texts = process_documents()
File "/Users/pierrick.lozach/Documents/privateGPT/ingest.py", line 121, in process_documents
documents = load_documents(source_directory, ignored_files)
File "/Users/pierrick.lozach/Documents/privateGPT/ingest.py", line 109, in load_documents
for i, doc in enumerate(pool.imap_unordered(load_single_document, filtered_files)):
File "/Users/pierrick.lozach/anaconda3/lib/python3.10/multiprocessing/pool.py", line 873, in next
raise value
AttributeError: 'NoneType' object has no attribute 'strip'
After updating to python 3.11 solve it for me.
No luck for me.
Python version:
Python 3.11.3
Error:
Creating new vectorstore
Loading documents from source_documents
Loading new documents: 0%| | 0/1 [00:01<?, ?it/s]
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/pierrick.lozach/anaconda3/envs/privateGPT/lib/python3.11/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
^^^^^^^^^^^^^^^^^^^
File "/Users/pierrick.lozach/Documents/privateGPT/ingest.py", line 89, in load_single_document
return loader.load()[0]
^^^^^^^^^^^^^
File "/Users/pierrick.lozach/anaconda3/envs/privateGPT/lib/python3.11/site-packages/langchain/document_loaders/csv_loader.py", line 48, in load
content = "\n".join(f"{k.strip()}: {v.strip()}" for k, v in row.items())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pierrick.lozach/anaconda3/envs/privateGPT/lib/python3.11/site-packages/langchain/document_loaders/csv_loader.py", line 48, in <genexpr>
content = "\n".join(f"{k.strip()}: {v.strip()}" for k, v in row.items())
^^^^^^^
AttributeError: 'NoneType' object has no attribute 'strip'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/pierrick.lozach/Documents/privateGPT/ingest.py", line 178, in <module>
main()
File "/Users/pierrick.lozach/Documents/privateGPT/ingest.py", line 167, in main
texts = process_documents()
^^^^^^^^^^^^^^^^^^^
File "/Users/pierrick.lozach/Documents/privateGPT/ingest.py", line 121, in process_documents
documents = load_documents(source_directory, ignored_files)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pierrick.lozach/Documents/privateGPT/ingest.py", line 109, in load_documents
for i, doc in enumerate(pool.imap_unordered(load_single_document, filtered_files)):
File "/Users/pierrick.lozach/anaconda3/envs/privateGPT/lib/python3.11/multiprocessing/pool.py", line 873, in next
raise value
AttributeError: 'NoneType' object has no attribute 'strip'
Strange! Your file (stored as csv) works for me when I use the delimiter option. (Python 3.10.11 btw)
Thanks for that. I reduced the number of entries in my csv and it works indeed. I guess some items must be incorrect. I will work on that.
FYI, I just faced that issue again and it seems to be due to invalid characters (escape quotes in my case).
This issue seems to be due to CSVLoader itself as it's reference in this issue here: https://github.com/hwchase17/langchain/issues/2074
@PierrickI3 do you know a tool to check my docs with to see what's the issue?
Describe the bug and how to reproduce it
ingest.py fails with a single csv file
my .env:
I cannot share the csv file but it is
;
separated.