Open ggjx22 opened 6 months ago
@ggjx22 how did you call llama-parse? Can you give some code?
And also, what version are you on? I would try updating pip install -U llama_parse
I am reading the .pdf
files directly from azure blob storage.
def get_file_extractor(llama_cloud_api_key):
# instantiate the parser
parser = LlamaParse(
api_key=llama_cloud_api_key,
result_type='markdown',
parsing_instruction=get_parser_instructions()
)
file_extractor = {'.pdf': parser}
return file_extractor
def get_azure_blob_reader(container_name, container_blob_name, connection_string, blob_name, file_extractor):
return AzStorageBlobReader(
container_name=f'{container_name}/{container_blob_name}',
connection_string=connection_string,
blob=blob_name,
file_extractor=file_extractor
)
def parse_file_from_az_blob(container_name, container_blob_name, connection_string, blob_name):
# create file extractor
file_extractor = get_file_extractor(llama_cloud_api_key=LLAMA_CLOUD_API_KEY)
# get the azure blob reader
az_blob_reader = get_azure_blob_reader(
container_name=container_name,
container_blob_name=container_blob_name,
connection_string=connection_string,
blob_name=blob_name,
file_extractor=file_extractor
)
# load data from blob
try:
document = az_blob_reader.load_data()
return document
except Exception as error:
print(f'ERROR: An error has occured while parsing the file. {error}')
This is how I am calling the functions:
document = llama.parse_file_from_az_blob(
container_name=storage_container_name,
container_blob_name=container_blob_name,
connection_string=connection_string,
blob_name=blob_name
)
I am on version 0.4.1 but the RuntimeError is still happening. My latest parsing job id that is causing the error is job_id a943c84f-4d98-4ec5-aa8a-794c031a4c37
I have a feeling its maybe failing to load from azure, and killing the async loop
Does it work if you remove the file extractor?
Currently I am testing with 9 pdf files in my blob storage. The RuntimeError issue does not appear if file_extractor
is not provided to AzStorageReader
. When LlamaParse
is provided as seen in issue above, I will encounter around 2 to 3 RuntimeError in a loop.
Recently, I am running into RuntimeError while using LlamaParse. This is my job id:
job_id ee9624b6-9a4a-4b4c-8229-05735fe807a2
Has anyone encounter the same and know what exactly is going on?Error traceback messages: