Traceback (most recent call last):
File "bugCSV.py", line 8, in <module>
loader.load()
File "base.py", line 30, in load
return list(self.lazy_load())
^^^^^^^^^^^^^^^^^^^^^^
File "csv_loader.py", line 147, in lazy_load
raise RuntimeError(f"Error loading {self.file_path}") from e
RuntimeError: Error loading ./demo_bug.csv
Expected Behavior
When a metadata column specified in metadata_columns does not exist in the CSV file, I expected the loader to raise a ValueError with a message like:
ValueError: Metadata column 'MISSING_HEADER' not found in CSV file.
Instead, the current implementation raises a generic RuntimeError, making it harder to debug the specific cause of the issue.
Error Message and Stack Trace (if applicable)
No response
Description
In the current implementation of CSVLoader within langchain_community.document_loaders.csv_loader, a generic RuntimeError is raised when an error occurs while loading the CSV file, even when the underlying issue is due to missing metadata columns. This masks the actual problem, making debugging more difficult for users.
Specifically, when a column specified in the metadata_columns parameter is not present in the CSV file, a more appropriate ValueError should be raised, indicating the missing column. However, due to broad exception handling in the lazy_load() method, this specific error is hidden behind a RuntimeError.
Expected Behavior
When a metadata column specified by the user is missing from the CSV file, the loader should raise a ValueError, providing a clear message about the missing column, instead of the generic RuntimeError.
Actual Behavior
A generic RuntimeError is raised, which does not specify that the issue stems from a missing column in the CSV file. This makes it difficult for users to identify the root cause of the problem.
Proposed Solution
The error handling in the lazy_load() method should be adjusted to allow more specific exceptions, such as ValueError, to propagate. This will ensure that the appropriate error is raised and presented to the user when metadata columns are missing.
the appropriate error is raised and presented to the user when metadata columns are missing.
def lazy_load(self) -> Iterator[Document]:
try:
with open(self.file_path, newline="", encoding=self.encoding) as csvfile:
yield from self.__read_file(csvfile)
except UnicodeDecodeError as e:
if self.autodetect_encoding:
detected_encodings = detect_file_encodings(self.file_path)
for encoding in detected_encodings:
try:
with open(
self.file_path, newline="", encoding=encoding.encoding
) as csvfile:
yield from self.__read_file(csvfile)
break
except UnicodeDecodeError:
continue
else:
raise RuntimeError(f"Error loading {self.file_path}") from e
except ValueError as ve: # Allow ValueError to propagate
raise ve
except Exception as e:
raise RuntimeError(f"Error loading {self.file_path}") from e
Checked other resources
Example Code
Steps to Reproduce
Create a CSV file (e.g.,
demo_bug.csv
):Use the following Python code to load the CSV:
You will get the following traceback:
Expected Behavior
When a metadata column specified in
metadata_columns
does not exist in the CSV file, I expected the loader to raise aValueError
with a message like:Instead, the current implementation raises a generic RuntimeError, making it harder to debug the specific cause of the issue.
Error Message and Stack Trace (if applicable)
No response
Description
In the current implementation of
CSVLoader
withinlangchain_community.document_loaders.csv_loader
, a genericRuntimeError
is raised when an error occurs while loading the CSV file, even when the underlying issue is due to missing metadata columns. This masks the actual problem, making debugging more difficult for users.Specifically, when a column specified in the
metadata_columns
parameter is not present in the CSV file, a more appropriateValueError
should be raised, indicating the missing column. However, due to broad exception handling in thelazy_load()
method, this specific error is hidden behind aRuntimeError
.Expected Behavior
When a metadata column specified by the user is missing from the CSV file, the loader should raise a
ValueError
, providing a clear message about the missing column, instead of the genericRuntimeError
.Actual Behavior
A generic
RuntimeError
is raised, which does not specify that the issue stems from a missing column in the CSV file. This makes it difficult for users to identify the root cause of the problem.Proposed Solution
The error handling in the
lazy_load()
method should be adjusted to allow more specific exceptions, such asValueError
, to propagate. This will ensure that the appropriate error is raised and presented to the user when metadata columns are missing.the appropriate error is raised and presented to the user when metadata columns are missing.
System Info
Environment
langchain==0.2.12
langchain-community==0.2.11
langchain-core==0.2.38
langchain-text-splitters==0.2.2
langchain-unstructured==0.1.2
3.12.3