run-llama / llama_parse

Parse files for optimal RAG
https://www.llamaindex.ai
MIT License
2.41k stars 245 forks source link

Missing pages in PDF extraction #301

Open ChrisPF123 opened 1 month ago

ChrisPF123 commented 1 month ago

Describe the bug I have a 6 page PDF containing tables within images. Llama parse extracts 2 of the 6 pages. Without any insight into why the other pages are missing.

Also when i parse a PDF that just contains a missing page to see what happens llamaparse responds with "Result not found. Check job status to see if it has completed."

Here is the output for python.

Page 1 content:
Doc ID: 0df659d7-f8d0-42e2-bb11-85f543a2152c
Text:
Page 2 content:
Doc ID: 1b091a16-8d7b-4dd1-a0e5-ba5e58078916
Text:
Page 3 content:
Doc ID: 5ae4b9ea-3635-430e-bd76-ebd9bf5457e7
Text: |Strata Plan LMS 2744|Alameda Park| | |---|---|---| |Balance
Sheet As at 10/31/2022| | | |ASSETS| | | | |---|---|---|---| |Current
Assets|Petty Cash| | | | |150.00| |Bank: Operating Account| | | | |
|19,737.38| |Total| | | | | |19,887.38| |Bank CRF|Bank: Contingency
Reserve| | | | |177,297.38|72,472.94| |Total Bank CRF| | | | |
|249,770.32| |Ba...
Page 4 content:
Doc ID: 5d708152-0b32-470d-906d-0af422925c47
Text: Strata Plan LMS 2744 Alameda Park # Comparative Income
Statement | |Actual|Budget|Actual|Budget| | | |
|---|---|---|---|---|---|---|---| | |10/01/2022 to
10/31/2022|10/01/2022 to 10/31/2022|Difference|10/31/2022 to
10/31/2022|10/31/2022 to 10/31/2022|Difference| | |REVENUE| | | | | |
| | |General RevenueStrata Fees|17,383.48|17,383.41|0.07|104...
Page 5 content:
Doc ID: 5945476c-b51a-4053-8da7-b94d161fc79c
Text:
Page 6 content:
Doc ID: 78f04cea-b0dc-4b3f-8aa4-67dcd6bda6fe
Text:

Files 9c8b93ef-bb8a-4c93-a7c3-a95772ad5a0a_3235_W_4th_Ave_Vancouver_BC_5a0a_corrected.pdf

Job ID 3802690c-cfea-4fd2-9ee8-d01559fa147a

8adb66e5-672b-4a68-b696-653be95880f4

Client:

Options Default options

Additional context Id love to switch from textract to llamaparse but these issues are stopping me.

tkcoding commented 1 month ago

@ChrisPF123

I tried to run it by parsing the json object. It seems like it cant recognise the page as a pdf.

{'page': 5, 'text': '', 'md': 'The provided input is missing. In order to provide the requested markdown output, please upload the file or provide the text content.', 'images': [{'name': 'img_p4_1.png', 'height': 1237, 'width': 1600, 'x': 0, 'y': 0, 'original_width': 1700, 'original_height': 2200}

most probably you can unable gpt4o_mode=True until the fix is push (please aware of the cost).

Here's with gpt4o_mode = True extraction. {'page': 5, 'md': '# STRATA PLAN LMS 2744\n## 2021/2022 ACTUAL 2022/2023 PROPOSED BUDGET\n\n### INCOME:\n\n| Description | 2021/2022 ACTUAL | 2021/2022 BUDGET | 2022/2023 BUDGET |\n|------------------------------------|------------------|------------------|------------------|\n| Balance Forward | 6,490.00 | 6,490.00 | 2,322.00 |\n| Strata Fees | 202,350.00 | 202,350.00 | 208,601.00 |\n| Miscellaneous Income | 355.00 | 0.00 | 0.00 |\n| Fines/Penalties | 0.00 | 0.00 | 0.00 |\n| Move In/Out Fee | 600.00 | 0.00 | 0.00 |\n| Rebate - Fortis BC New Boilers | 0.00 | 0.00 | 7,500.00 |\n| Interest Income | 52.00 | 0.00 | 0.00 |\n| TOTAL INCOME | 209,847.00 | 208,840.00 | 218,423.00 |\n\n### DISBURSEMENTS:\n\n#### General & Administration\n\n| Description | 2021/2022 ACTUAL | VARIANCE | 2021/2022 BUDGET | 2022/2023 BUDGET |\n|------------------------------------|------------------|----------|------------------|------------------|\n| Accounting & Legal | 622.00 | 22.00 | 600.00 | 600.00 |\n| Management Fee | 17,892.00 | 0.00 | 17,892.00 | 18,600.00 |\n| Administration | 3,596.00 | 396.00 | 3,200.00 | 2,000.00 |\n| Insurance/Appraisal | 37,661.00 | 661.00 | 37,000.00 | 42,575.00 |\n| Enterphone Lease | 3,713.00 | -2,995.00| 6,708.00 | 8,195.00 |\n| Interest & Bank Charges | 360.00 | -240.00 | 600.00 | 600.00 |\n| Total General & Administration | 63,844.00 | -2,156.00 | 66,000.00 | 72,570.00 |\n\n#### Utilities Expense\n\n| Description | 2021/2022 ACTUAL | VARIANCE | 2021/2022 BUDGET | 2022/2023 BUDGET |\n|------------------------------------|------------------|----------|------------------|------------------|\n| Electricity | 13,790.00 | 3,390.00 | 10,400.00 | 13,600.00 |\n| Gas | 21,484.00 | 3,083.00 | 18,401.00 | 24,635.00 |\n| Water | 17,500.00 | -820.00 | 18,320.00 | 19,250.00 |\n| Total Utility Expense | 52,774.00 | | 47,121.00 | 57,485.00 |\n\n#### Building Maintenance\n\n| Description | 2021/2022 ACTUAL | VARIANCE | 2021/2022 BUDGET | 2022/2023 BUDGET |\n|------------------------------------|------------------|----------|------------------|------------------|\n| Elevator | 6,056.00 | 1,256.00 | 4,800.00 | 6,000.00 |\n| Fire Equipment Maintenance | 2,500.00 | 0.00 | 2,500.00 | 2,500.00 |\n| Janitorial | 7,135.00 | -365.00 | 7,500.00 | 7,245.00 |\n| Gardening & Supplies | 7,989.00 | -2,511.00| 10,500.00 | 11,000.00 |\n| Garbage/Recycling | 9,523.00 | 753.00 | 8,770.00 | 9,024.00 |\n| Water Treatment Lease Payment | 3,802.00 | 0.00 | 3,802.00 | 3,802.00 |\n\n', 'images': [{'name': 'page-4.jpg', 'height': 0, 'width': 0, 'x': 0, 'y': 0}], 'items': [{'type': 'heading', 'lvl': 1, 'value': 'STRATA PLAN LMS 2744', 'md': '# STRATA PLAN LMS 2744'}, {'type': 'heading', 'lvl': 2, 'value': '2021/2022 ACTUAL 2022/2023 PROPOSED BUDGET', 'md': '## 2021/2022 ACTUAL 2022/2023 PROPOSED BUDGET'}, {'type': 'heading', 'lvl': 3, 'value': 'INCOME:', 'md': '### INCOME:'}, {'type': 'table', 'rows': [['Description', '2021/2022 ACTUAL', '2021/2022 BUDGET', '2022/2023 BUDGET'], ['Balance Forward', '6,490.00', '6,490.00', '2,322.00'], ['Strata Fees', '202,350.00', '202,350.00', '208,601.00'], ['Miscellaneous Income', '355.00', '0.00', '0.00'], ['Fines/Penalties', '0.00', '0.00', '0.00'], ['Move In/Out Fee', '600.00', '0.00', '0.00'], ['Rebate - Fortis BC New Boilers', '0.00', '0.00', '7,500.00'], ['Interest Income', '52.00', '0.00', '0.00'], ['TOTAL INCOME', '209,847.00', '208,840.00', '218,423.00']], 'md': '| Description | 2021/2022 ACTUAL | 2021/2022 BUDGET | 2022/2023 BUDGET |\n|------------------------------------|------------------|------------------|------------------|\n| Balance Forward | 6,490.00 | 6,490.00 | 2,322.00 |\n| Strata Fees | 202,350.00 | 202,350.00 | 208,601.00 |\n| Miscellaneous Income | 355.00 | 0.00 | 0.00 |\n| Fines/Penalties | 0.00 | 0.00 | 0.00 |\n| Move In/Out Fee | 600.00 | 0.00 | 0.00 |\n| Rebate - Fortis BC New Boilers | 0.00 | 0.00 | 7,500.00 |\n| Interest Income | 52.00 | 0.00 | 0.00 |\n| TOTAL INCOME | 209,847.00 | 208,840.00 | 218,423.00 |', 'isPerfectTable': True, 'csv': '"Description","2021/2022 ACTUAL","2021/2022 BUDGET","2022/2023 BUDGET"\n"Balance Forward","6,490.00","6,490.00","2,322.00"\n"Strata Fees","202,350.00","202,350.00","208,601.00"\n"Miscellaneous Income","355.00","0.00","0.00"\n"Fines/Penalties","0.00","0.00","0.00"\n"Move In/Out Fee","600.00","0.00","0.00"\n"Rebate - Fortis BC New Boilers","0.00","0.00","7,500.00"\n"Interest Income","52.00","0.00","0.00"\n"TOTAL INCOME","209,847.00","208,840.00","218,423.00"'}, {'type': 'heading', 'lvl': 3, 'value': 'DISBURSEMENTS:', 'md': '### DISBURSEMENTS:'}, {'type': 'heading', 'lvl': 4, 'value': 'General & Administration', 'md': '#### General & Administration'}, {'type': 'table', 'rows': [['Description', '2021/2022 ACTUAL', 'VARIANCE', '2021/2022 BUDGET', '2022/2023 BUDGET'], ['Accounting & Legal', '622.00', '22.00', '600.00', '600.00'], ['Management Fee', '17,892.00', '0.00', '17,892.00', '18,600.00'], ['Administration', '3,596.00', '396.00', '3,200.00', '2,000.00'], ['Insurance/Appraisal', '37,661.00', '661.00', '37,000.00', '42,575.00'], ['Enterphone Lease', '3,713.00', '-2,995.00', '6,708.00', '8,195.00'], ['Interest & Bank Charges', '360.00', '-240.00', '600.00', '600.00'], ['Total General & Administration', '63,844.00', '-2,156.00', '66,000.00', '72,570.00']], 'md': '| Description | 2021/2022 ACTUAL | VARIANCE | 2021/2022 BUDGET | 2022/2023 BUDGET |\n|------------------------------------|------------------|----------|------------------|------------------|\n| Accounting & Legal | 622.00 | 22.00 | 600.00 | 600.00 |\n| Management Fee | 17,892.00 | 0.00 | 17,892.00 | 18,600.00 |\n| Administration | 3,596.00 | 396.00 | 3,200.00 | 2,000.00 |\n| Insurance/Appraisal | 37,661.00 | 661.00 | 37,000.00 | 42,575.00 |\n| Enterphone Lease | 3,713.00 | -2,995.00| 6,708.00 | 8,195.00 |\n| Interest & Bank Charges | 360.00 | -240.00 | 600.00 | 600.00 |\n| Total General & Administration | 63,844.00 | -2,156.00 | 66,000.00 | 72,570.00 |', 'isPerfectTable': True, 'csv': '"Description","2021/2022 ACTUAL","VARIANCE","2021/2022 BUDGET","2022/2023 BUDGET"\n"Accounting & Legal","622.00","22.00","600.00","600.00"\n"Management Fee","17,892.00","0.00","17,892.00","18,600.00"\n"Administration","3,596.00","396.00","3,200.00","2,000.00"\n"Insurance/Appraisal","37,661.00","661.00","37,000.00","42,575.00"\n"Enterphone Lease","3,713.00","-2,995.00","6,708.00","8,195.00"\n"Interest & Bank Charges","360.00","-240.00","600.00","600.00"\n"Total General & Administration","63,844.00","-2,156.00","66,000.00","72,570.00"'}, {'type': 'heading', 'lvl': 4, 'value': 'Utilities Expense', 'md': '#### Utilities Expense'}, {'type': 'table', 'rows': [['Description', '2021/2022 ACTUAL', 'VARIANCE', '2021/2022 BUDGET', '2022/2023 BUDGET'], ['Electricity', '13,790.00', '3,390.00', '10,400.00', '13,600.00'], ['Gas', '21,484.00', '3,083.00', '18,401.00', '24,635.00'], ['Water', '17,500.00', '-820.00', '18,320.00', '19,250.00'], ['Total Utility Expense', '52,774.00', '', '47,121.00', '57,485.00']], 'md': '| Description | 2021/2022 ACTUAL | VARIANCE | 2021/2022 BUDGET | 2022/2023 BUDGET |\n|------------------------------------|------------------|----------|------------------|------------------|\n| Electricity | 13,790.00 | 3,390.00 | 10,400.00 | 13,600.00 |\n| Gas | 21,484.00 | 3,083.00 | 18,401.00 | 24,635.00 |\n| Water | 17,500.00 | -820.00 | 18,320.00 | 19,250.00 |\n| Total Utility Expense | 52,774.00 | | 47,121.00 | 57,485.00 |', 'isPerfectTable': True, 'csv': '"Description","2021/2022 ACTUAL","VARIANCE","2021/2022 BUDGET","2022/2023 BUDGET"\n"Electricity","13,790.00","3,390.00","10,400.00","13,600.00"\n"Gas","21,484.00","3,083.00","18,401.00","24,635.00"\n"Water","17,500.00","-820.00","18,320.00","19,250.00"\n"Total Utility Expense","52,774.00","","47,121.00","57,485.00"'}, {'type': 'heading', 'lvl': 4, 'value': 'Building Maintenance', 'md': '#### Building Maintenance'}, {'type': 'table', 'rows': [['Description', '2021/2022 ACTUAL', 'VARIANCE', '2021/2022 BUDGET', '2022/2023 BUDGET'], ['Elevator', '6,056.00', '1,256.00', '4,800.00', '6,000.00'], ['Fire Equipment Maintenance', '2,500.00', '0.00', '2,500.00', '2,500.00'], ['Janitorial', '7,135.00', '-365.00', '7,500.00', '7,245.00'], ['Gardening & Supplies', '7,989.00', '-2,511.00', '10,500.00', '11,000.00'], ['Garbage/Recycling', '9,523.00', '753.00', '8,770.00', '9,024.00'], ['Water Treatment Lease Payment', '3,802.00', '0.00', '3,802.00', '3,802.00']], 'md': '| Description | 2021/2022 ACTUAL | VARIANCE | 2021/2022 BUDGET | 2022/2023 BUDGET |\n|------------------------------------|------------------|----------|------------------|------------------|\n| Elevator | 6,056.00 | 1,256.00 | 4,800.00 | 6,000.00 |\n| Fire Equipment Maintenance | 2,500.00 | 0.00 | 2,500.00 | 2,500.00 |\n| Janitorial | 7,135.00 | -365.00 | 7,500.00 | 7,245.00 |\n| Gardening & Supplies | 7,989.00 | -2,511.00| 10,500.00 | 11,000.00 |\n| Garbage/Recycling | 9,523.00 | 753.00 | 8,770.00 | 9,024.00 |\n| Water Treatment Lease Payment | 3,802.00 | 0.00 | 3,802.00 | 3,802.00 |', 'isPerfectTable': True, 'csv': '"Description","2021/2022 ACTUAL","VARIANCE","2021/2022 BUDGET","2022/2023 BUDGET"\n"Elevator","6,056.00","1,256.00","4,800.00","6,000.00"\n"Fire Equipment Maintenance","2,500.00","0.00","2,500.00","2,500.00"\n"Janitorial","7,135.00","-365.00","7,500.00","7,245.00"\n"Gardening & Supplies","7,989.00","-2,511.00","10,500.00","11,000.00"\n"Garbage/Recycling","9,523.00","753.00","8,770.00","9,024.00"\n"Water Treatment Lease Payment","3,802.00","0.00","3,802.00","3,802.00"'}]}