Closed SmileLollipop closed 1 month ago
Hey @SmileLollipop! 🍭 Great to see you back in the wild world of coding challenges. Hope you're ready for another adventure! 🚀
Based on the traceback you've provided, it seems like the error is related to the parsing of the JSON output. The error message Expecting ',' delimiter: line 151 column 6 (char 3019)
suggests that there might be a syntax error in your JSON output.
To handle this, you could consider validating your JSON output before attempting to parse it. This could be done using a JSON validator to ensure that the output is correctly formatted. If the JSON is not valid, you can then handle the error appropriately, for example by logging an error message and skipping the invalid data.
In terms of handling large JSON outputs, there are several strategies you could consider:
Streaming the JSON Output: If the JSON output is very large, consider streaming the output instead of loading it all into memory at once. This approach can significantly reduce memory usage.
Increasing Memory Limits: If the issue is related to memory limits, increasing the available memory to the Python process could help. This is more of a workaround than a solution and might not be applicable depending on the environment.
Optimizing the Output Parser: Review and optimize the self._output_parser.parse(raw_output)
method in the LLMTextCompletionProgram
class to ensure it can efficiently handle large strings. This might involve using more efficient parsing libraries or techniques that are optimized for large data.
Chunking the Output: Modify the logic to process and output the JSON in chunks rather than as a single large document. This could involve modifying the parse
method of the output parser to yield chunks of the parsed output incrementally.
However, without specific details on the error or the requirements for handling long JSON outputs, these suggestions are general best practices for dealing with large data outputs in Python.
As for the changes between version 0.10.18 and the latest version 0.10.25, I wasn't able to find specific changes that address the issue of handling long JSON outputs. It's possible that such changes were made, but without access to the specific commit history or release notes for these versions, I can't provide a definitive answer.
Finally, regarding the maximum size of JSON output that the LLMTextCompletionProgram
can handle without encountering errors in version 0.10.18, the context suggests a limit of 4096 tokens. However, this limit is mentioned in the context of ensuring that the length of the data does not exceed the 4096 token limit for processing, and it's unclear whether this limit applies to the size of the JSON output.
I hope this helps! If you have any further questions or if there's anything else you'd like me to look into, please let me know.
program = LLMTextCompletionProgram.from_defaults( output_parser=PydanticOutputParser(output_cls=TableInfo), prompt_template_str=prompt_template_str, verbose=True, ) print(str(raw_nodes[1].text)) print('-----------------------------') output = program(input=raw_nodes[1].text)
how to update the code as above
To handle long JSON outputs more effectively and avoid parsing errors, you can update your code to include chunking of the JSON data. Here is an updated version of your code that incorporates chunking:
from llama_index.core.program.llm_program import LLMTextCompletionProgram
from llama_index.core.output_parsers.pydantic import PydanticOutputParser
from llama_index.legacy.readers.json import JSONReader
from tempfile import TemporaryDirectory
import json
# Define your TableInfo model here
class TableInfo(BaseModel):
# Define your fields here
pass
# Initialize the program with the PydanticOutputParser
program = LLMTextCompletionProgram.from_defaults(
output_parser=PydanticOutputParser(output_cls=TableInfo),
prompt_template_str=prompt_template_str,
verbose=True,
)
# Print the raw text for debugging
print(str(raw_nodes[1].text))
print('-----------------------------')
# Function to handle chunking of JSON data
def process_large_json(input_text):
try:
data = json.loads(input_text)
except json.JSONDecodeError:
# Handle invalid JSON input here
return []
chunk_size = 100 # Define a chunk size
results = []
if isinstance(data, list):
for i in range(0, len(data), chunk_size):
chunk = data[i:i + chunk_size]
chunk_text = json.dumps(chunk)
result = program(input=chunk_text)
results.append(result)
else:
result = program(input=input_text)
results.append(result)
return results
# Process the input text with chunking
output = process_large_json(raw_nodes[1].text)
print(output)
In this updated code:
process_large_json
function handles chunking of the JSON data.chunk_size
as needed).program
for processing, and the results are collected.This approach helps manage memory usage and avoid potential parsing errors with very large JSON arrays.
To continue talking to Dosu, mention @dosu.
Bug Description
LLMTextCompletionProgram: when output long json, it will be error , can we have a good idea to deal with it?
Version
0.10.18
Steps to Reproduce
Traceback (most recent call last): File "d:\Baoxin\Program\Demo3\module_ty\data_module.py", line 243, in
print(cc)
^^^^^^^
File "d:\Baoxin\Program\Demo3\module_ty\data_module.py", line 112, in gen_data
output2 = program2(text=text)
^^^^^^^^^^^^^^^^^^^
File "D:\software\anaconda3\envs\llm_env\Lib\site-packages\llama_index\core\program\llm_program.py", line 104, in call
output = self._output_parser.parse(raw_output)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\anaconda3\envs\llm_env\Lib\site-packages\llama_index\core\output_parsers\pydantic.py", line 63, in parse
return self._output_cls.parse_raw(json_str)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\anaconda3\envs\llm_env\Lib\site-packages\pydantic\main.py", line 1124, in parse_raw
raise pydantic_core.ValidationError.from_exception_data(cls.name, [error])
pydantic_core._pydantic_core.ValidationError: 1 validation error for EmergencyExercise
root
Expecting ',' delimiter: line 151 column 6 (char 3019) [type=value_error.jsondecode, input_value='{\n "exercise_name": "... }\n ]\n }', input_type=str]
Relevant Logs/Tracbacks
No response