Closed hrhrng closed 2 days ago
Hey there, @hrhrng! I'm a bot here to help you with bugs, questions, and contributions. How can I assist you today?
The discrepancy between unstructured.documents.html.HTMLTable
and unstructured.elements.Table
could indeed be due to the version of your unstructured
library. The notebook docs/docs/examples/query_engine/sec_tables/tesla_10q_table.ipynb
does not specify the exact versions of the unstructured
library being used.
To ensure compatibility and correct parsing of tables, verify that you are using compatible versions of the unstructured
library. You can check the version of the unstructured
library you are using with the following command:
import unstructured
print(unstructured.__version__)
If there is a version mismatch, you may need to update or downgrade your unstructured
library to match the version used in the notebook or the version that supports both unstructured.documents.html.HTMLTable
and unstructured.elements.Table
[1].
Additionally, the LlamaIndex documentation suggests an alternative method for parsing tables if the unstructured
library fails to recognize them correctly. This involves using the html_to_df
function to convert HTML table elements into DataFrames and then processing them accordingly [2][3].
To continue talking to Dosu, mention @dosu.
Probably due to unstructured changing how their types work over time. Would need a PR to fix
@hrhrng Could you please check if the linked PR fixes this?
Bug Description
I’m trying to run
docs/docs/examples/query_engine/sec_tables/tesla_10q_table.ipynb
, but no tables are being parsed. After debugging the source code, I found that the conditionin
llama_index/core/node_parser/relational/unstructured_element.py:110
is not being satisfied. It appears that the unstructured library returns aunstructured.elements.Table
object instead ofunstructured.documents.html.HTMLTable
when it recognize a table element in html file. I suspect this discrepancy is causing the tables to not be recognized and parsed correctly. Could this be due to the version of my unstructured library, or other reasons?Version
0.10.47
Steps to Reproduce
pip install unstructured and run
docs/docs/examples/query_engine/sec_tables/tesla_10q_table.ipynb
Relevant Logs/Tracbacks
No response