Detected a JSON file that does not conform to the Unstructured schema. partition_json currently only processes serialized Unstructured output while using langchain S3DirectoryLoader #24588
[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
Example Code
loader = S3DirectoryLoader(bucket=s3_bucket_name, prefix=s3_prefix)
try:
documents = loader.load()
logging.info(f"size of the loaded documents {len(documents)}")
except Exception as e:
logging.info(f"error loading documents: {e}")
Error Message and Stack Trace (if applicable)
Detected a JSON file that does not conform to the Unstructured schema. partition_json currently only processes serialized Unstructured output.
doc = loader.load()
^^^^^^^^^^^^^
File "/prj/.venv/lib/python3.12/site-packages/langchain_community/document_loaders/s3_directory.py", line 139, in load
docs.extend(loader.load())
^^^^^^^^^^^^^
File "/prj/.venv/lib/python3.12/site-packages/langchain_core/document_loaders/base.py", line 30, in load
return list(self.lazy_load())
^^^^^^^^^^^^^^^^^^^^^^
File "/prj/.venv/lib/python3.12/site-packages/langchain_community/document_loaders/unstructured.py", line 89, in lazy_load
elements = self._get_elements()
^^^^^^^^^^^^^^^^^^^^
File "/prj/.venv/lib/python3.12/site-packages/langchain_community/document_loaders/s3_file.py", line 135, in _get_elements
return partition(filename=file_path, **self.unstructured_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/prj/.venv/lib/python3.12/site-packages/unstructured/partition/auto.py", line 389, in partition
raise ValueError(
ValueError: Detected a JSON file that does not conform to the Unstructured schema. partition_json currently only processes serialized Unstructured output.
Description
My S3 bucket has a single folder, this folder contains json files.
Bucket name: "abc-bc-name"
Prefix: "output"
Checked other resources
Example Code
loader = S3DirectoryLoader(bucket=s3_bucket_name, prefix=s3_prefix) try: documents = loader.load() logging.info(f"size of the loaded documents {len(documents)}")
Error Message and Stack Trace (if applicable)
Detected a JSON file that does not conform to the Unstructured schema. partition_json currently only processes serialized Unstructured output.
doc = loader.load() ^^^^^^^^^^^^^ File "/prj/.venv/lib/python3.12/site-packages/langchain_community/document_loaders/s3_directory.py", line 139, in load docs.extend(loader.load()) ^^^^^^^^^^^^^ File "/prj/.venv/lib/python3.12/site-packages/langchain_core/document_loaders/base.py", line 30, in load return list(self.lazy_load()) ^^^^^^^^^^^^^^^^^^^^^^ File "/prj/.venv/lib/python3.12/site-packages/langchain_community/document_loaders/unstructured.py", line 89, in lazy_load elements = self._get_elements() ^^^^^^^^^^^^^^^^^^^^ File "/prj/.venv/lib/python3.12/site-packages/langchain_community/document_loaders/s3_file.py", line 135, in _get_elements return partition(filename=file_path, **self.unstructured_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/prj/.venv/lib/python3.12/site-packages/unstructured/partition/auto.py", line 389, in partition raise ValueError( ValueError: Detected a JSON file that does not conform to the Unstructured schema. partition_json currently only processes serialized Unstructured output.
Description
My S3 bucket has a single folder, this folder contains json files. Bucket name: "abc-bc-name" Prefix: "output"
file content is json
{ "abc": "This is a text json file", "source": "https://asf.test/4865422_f4866011606d84f50d10e60e0b513b7", "correlation_id": "4865422_f4866011606d84f50d10e60e0b513b7" }
System Info
langchain==0.2.10 langchain-cli==0.0.25 langchain-community==0.2.9 langchain-core==0.2.22 langchain-openai==0.1.17 langchain-text-splitters==0.2.2
macOS Python 3.12.0