microsoft / table-transformer

Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
MIT License
2.22k stars 247 forks source link

about the annotation tool #62

Closed ZhenhuaTian closed 1 year ago

ZhenhuaTian commented 2 years ago

Could you please release the annotation tool for 1M-PubTables too?

LxYuan-Handshakes commented 2 years ago

+1

Hi @bsmock,
Is it possible to share the annotation tool for us to annotate our own custom dataset?

Thanks.

bsmock commented 2 years ago

We are actively working on releasing much of the data processing code. But the first version is likely to assume you already have an HTML for the table and bounding boxes for the text in every cell (like datasets such as ICDAR-2013, PubTables-1M and FinTabNet provide). The code would canonicalize this data, make bounding boxes for rows and columns, and put it into the format this repository expects for training. Is that what you're looking for?

abhayhk2001 commented 1 year ago

Hi, if its possible can you share the annotation tool with us. We are trying to fine tune the model for a custom dataset.

Tagging @bsmock for visibility