Issues with Table Extraction Accuracy in Sample

Problem Description

When using the sample provided by the llmware project, I've encountered issues with the accuracy of table extractions. Specifically, not all tables are being extracted correctly. As an example given in the sample, Annual_Report_2003.pdf.

Steps to Reproduce

Run the sample extraction process as in examples. Set query parameter to be empty string. Review the output and compare it to the expected tables within the documents. Only one table, to be exact, part of the table spreading from page 44-46, got correctly extracted.

Expected Outcome

All tables within the sample documents should be identified and extracted accurately. In this file, tables that are supposed to be extracted are given below as screenshot,

Actual Outcome

Only one csv outputs. Please refer to the outcome table_0.csv. All other table contents, though extracted in json file, is labeled as text.

Potential Impact

This issue may lead to incomplete or inaccurate data capture, which can affect the integrity of data analysis and further processing steps.

Request for Assistance

I would appreciate any guidance on how to resolve this issue or any suggested workarounds. Additionally, if there are any plans to improve the table extraction feature in the near future, information on that would also be helpful.

llmware-ai / llmware