run-llama / llama_extract

MIT License
105 stars 16 forks source link

"noisebridge_receipt.pdf" seems somehow broken? #17

Open kun432 opened 3 months ago

kun432 commented 3 months ago

extracting schema of example "noisebridge_receipt.pdf" always fails via both GUI and python.

extraction_schema = await extractor.ainfer_schema(
    "Test Schema", ["./llama_extract/examples/data/noisebridge_receipt.pdf"]
)
---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/llama_extract/base.py](https://localhost:8080/#) in ainfer_schema(self, name, seed_files, schema_id, project_id)
    220         try:
--> 221             _response_json = _response.json()
    222         except JSONDecodeError:

5 frames
JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

ApiError                                  Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/llama_extract/base.py](https://localhost:8080/#) in ainfer_schema(self, name, seed_files, schema_id, project_id)
    221             _response_json = _response.json()
    222         except JSONDecodeError:
--> 223             raise ApiError(status_code=_response.status_code, body=_response.text)
    224         raise ApiError(status_code=_response.status_code, body=_response_json)
    225 

ApiError: status_code: 500, body: Internal Server Error

for "parallels_invoice.pdf", extracting works correctly.