Polars report performance warnings when extracting tables
from img2table.document import Image
image = Image("test.png", detect_rotation=False)
result = image.extract_tables()
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/img2table/tables/processing/bordered_tables/cells/identification.py:17: PerformanceWarning: Determining the column names of a LazyFrame requires resolving its schema, which is a potentially expensive operation. Use `LazyFrame.collect_schema().names()` to get the column names without this warning.
.rename({col: f"{col}_" for col in df_h_lines.columns})
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/img2table/tables/processing/bordered_tables/cells/deduplication.py:21: PerformanceWarning: Determining the column names of a LazyFrame requires resolving its schema, which is a potentially expensive operation. Use `LazyFrame.collect_schema().names()` to get the column names without this warning.
.rename({col: f"{col}_" for col in df_cells.columns})
result=[ExtractedTable(title=None, bbox=(36, 21, 770, 327),shape=(6, 3)), ExtractedTable(title=None, bbox=(962, 21, 1154, 123),shape=(2, 2))]
I confirmed that collecting column names beforehand fixes the issue
+ # Collect the schema and get column names
+ column_names = df_cells.collect_schema().names()
+
# Create copy of df_cells
df_cells_cp = (df_cells.clone()
- .rename({col: f"{col}_" for col in df_cells.columns})
+ .rename({col: f"{col}_" for col in column_names})
)
Because the warning messages make it hard to see other stdout/stderr, and also to address the warning just in case (though I doubt it causes performance issues), we might want to follow the suggestions by polars.
I'm new to the OSS world and would appreciate your guidance
Polars report performance warnings when extracting tables
I confirmed that collecting column names beforehand fixes the issue
Because the warning messages make it hard to see other stdout/stderr, and also to address the warning just in case (though I doubt it causes performance issues), we might want to follow the suggestions by
polars
.I'm new to the OSS world and would appreciate your guidance