poloclub / unitable

UniTable: Towards a Unified Table Foundation Model
https://arxiv.org/abs/2403.04822
MIT License
276 stars 15 forks source link

Incorrect table strcuture detection and text extraction #10

Closed vikas-singh16 closed 3 weeks ago

vikas-singh16 commented 1 month ago

Hi, I am having an input image of size (338, 1388, 3). This is the input image. Airmaster_20892_Specification_Sheet_1tb_0.

This is the table structure model output. Screenshot from 2024-05-31 10-50-06.

This is the cell detection model output. Screenshot from 2024-05-31 10-51-10

This is the final output in a dataframe. Screenshot from 2024-05-31 10-52-01.

I have few question regarding this.

  1. why is the table structure misaligned?
  2. Is the a way i can input an image greater than (448 X 448) size. As i believe that this output is due to the resizing of the image into (448 X 448) size.
  3. Is there any way, these models can run on cpu?
ShengYun-Peng commented 1 month ago

Hi @vikas-singh16, thanks for your interest!

  1. The table in the screenshot seems OOD of the training distribution as it has super bold boundaries. I suggest finetuning on your customized dataset for better performance.
  2. It's possible but that will also trigger finetuning.
  3. You can directly run the full pipeline notebook on cpu by toggling the device variable.
vikas-singh16 commented 3 weeks ago

Hi, Thanks for the response. I tried a small hack and it worked. Just resize the while maintaining the image aspect ratios. The image will something like this.

Input Image padded_2_Airmaster_20892_Specification_Sheet_1tb_0

Output Screenshot from 2024-06-10 10-51-07

Pls let me know your thought on this.

vikas-singh16 commented 3 weeks ago

hi, Now with this approach I have encountered another issue. The table Headers are not properly placed at the right spot. While exploring this issue, I can across a function named "build_table_from_html_and_cell". Pls can u briefly explain me how are to connecting the html output and cell output and combining them into one.

yumikim381 commented 3 weeks ago

Does preserving aspect ratio works better for you in your use cases? I m guessing it works better for non-square like tables

vikas-singh16 commented 2 weeks ago

yes, It does work better in my case.