microsoft / table-transformer

Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
MIT License
2.22k stars 247 forks source link

Why did the row/column dilation get removed? #88

Open wandering-walrus opened 1 year ago

wandering-walrus commented 1 year ago

The paper talks about doing row/column bounding box dilation to align the rows and columns and remove gaps. I see in the postprocessing.py code that this code has been commented out and removed.

    # Dilate rows and columns before final extraction
    #dilated_columns = fill_column_gaps(columns, table_bbox)
    dilated_columns = columns
    #dilated_rows = fill_row_gaps(rows, table_bbox)
    dilated_rows = rows

Is there a reason for this? Or is the bounding box dilation happening elsewhere in the code that I've missed?