microsoft / table-transformer

Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
MIT License
2.22k stars 248 forks source link

Inference individual Image for table detection #17

Open matchalambada opened 2 years ago

matchalambada commented 2 years ago

Hi authors, I would like to visualize the result table detection for an specific Image. Which output in the code should I take out and modify in order to have coordianates of predicted bouding box to visualize it on the infered image?

mzhadigerov commented 2 years ago

Is there any update on that?

Architectshwet commented 2 years ago

How can we extract data in rows/column format from the table image from the trained model?

bsmock commented 2 years ago

In the current version of the code, you can find the function that takes the model output and processes it into a table representation here: https://github.com/microsoft/table-transformer/blob/3e1dd0c3cad7956c790765b491ec86817e94ce43/src/grits.py#L727

Jiangwentai commented 2 years ago

@bsmock

hello ,I want to know if I used the Functiuon objects_to_cells ,How can I get the page_tokens if I will use a new Image input

bsmock commented 2 years ago

How can I get the page_tokens if I will use a new Image input

Right now the code is written to be used with the PubTables-1M dataset or any dataset in the same format. For each table image in PubTables-1M, there is also a JSON file with a list of words in the image, which is read in as page_tokens. So the input image and the list of words (page_tokens) are what you need for inference.

You can have a look at the dataset to see examples of the format for page_tokens. Basically page_tokens needs to be a list of dicts, where each dict corresponds to a word or token and looks like this: {"text": "Table", "bbox": [xmin, ymin, xmax, ymax], "flags": 0, "block_num": 0, "line_num": 0, "span_num": 0}

At a minimum you'll need to fill in the "text", "bbox", and "span_num" fields, where "span_num" is an integer that puts the words in some order. When the code returns the text for each cell as a string, the words in the text string will be sorted by "block_num", then "line_num", then "span_num". So you can leave "flags", "block_num", and "line_num" as 0 as long as you put a unique integer for each word in "span_num".

jshtok commented 2 years ago

@bsmock , Can you please add at least one example image with all the required data structures to make a working inference example? It would help to understand the format without downloading 110Gb of data. Thank you!

suonbo commented 2 years ago

@bsmock , Can you please add at least one example image with all the required data structures to make a working inference example? It would help to understand the format without downloading 110Gb of data. Thank you!

You can find some samples from here: https://drive.google.com/drive/folders/0B5h08T2mGP3ffnZLbTZ0WVNRT3Zjdjl2eC11aW0tOFVCaU5Mb2c2Q0dmc21lNWo1Y3BuT3c?resourcekey=0-bphHgPyZKg0yT5V8F7BWjw&usp=sharing

jshtok commented 2 years ago

@bsmock , Can you please add at least one example image with all the required data structures to make a working inference example? It would help to understand the format without downloading 110Gb of data. Thank you!

You can find some samples from here: https://drive.google.com/drive/folders/0B5h08T2mGP3ffnZLbTZ0WVNRT3Zjdjl2eC11aW0tOFVCaU5Mb2c2Q0dmc21lNWo1Y3BuT3c?resourcekey=0-bphHgPyZKg0yT5V8F7BWjw&usp=sharing

Thank you, @suonbo , but in this location I can only see the .jpg images (and they are cropped tables, not whole pages). I am looking for example with data required in the inference example:

python main.py --mode eval --data_type structure --config_file structure_config.json --data_root_dir /path/to/pascal_voc_structure_data --model_load_path /path/to/structure_model --table_words_dir /path/to/json_table_words_data

specifically, I need the config file (not in the repo!), pascal_voc_structure, table_words_dir (what's there?), json_table_words_data ...

Danferno commented 1 year ago

To anyone interested, I uploaded an example of the table structure recognition files here. It holds the annotation (pascal voc), the words (json) and the table image (.jpg)

mineshmathew commented 1 year ago

Has anyone figured how to run table detection alone ?

Danferno commented 1 year ago

Has anyone figured how to run table detection alone ?

NielsRogge made a notebook with examples

muneeb2001 commented 1 year ago

NielsRogge made a notebook with examples

Can you share some tutorial where the table is converted to csv or html?

nuocheng commented 10 months ago

Has anyone figured how to run table detection alone ?

NielsRogge made a notebook with examples

Hello, thank you for providing a simple case study. I encountered an issue while running jupyter notebook. There is a dependency on resnet18 in the Microsoft/table transformer detection configuration, but I failed to download using the third-party Python library timm. Do you have any method to make table transformer detection load the local resnet18 configuration?

NielsRogge commented 10 months ago

Hi,

See #158 with updated notebooks and demos