microsoft / table-transformer

Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
MIT License
2.22k stars 247 forks source link

Visualize model predictions #46

Closed mzhadigerov closed 2 years ago

mzhadigerov commented 2 years ago

I ran the pre-trained model in eval mode and got this output:

python --mode eval --data_type structure --config_file structure_config.json --data_root_dir data/ --model_load_path data/model/structure.pth --debug --device cpu
{'lr': 5e-05, 'lr_backbone': 1e-05, 'batch_size': 2, 'weight_decay': 0.0001, 'epochs': 20, 'lr_drop': 1, 'lr_gamma': 0.9, 'clip_max_norm': 0.1, 'backbone': 'resnet18', 'num_classes': 6, 'dilation': False, 'position_embedding': 'sine', 'emphasized_weights': {}, 'enc_layers': 6, 'dec_layers': 6, 'dim_feedforward': 2048, 'hidden_dim': 256, 'dropout': 0.1, 'nheads': 8, 'num_queries': 125, 'pre_norm': True, 'masks': False, 'aux_loss': False, 'mask_loss_coef': 1, 'dice_loss_coef': 1, 'ce_loss_coef': 1, 'bbox_loss_coef': 5, 'giou_loss_coef': 2, 'eos_coef': 0.4, 'set_cost_class': 1, 'set_cost_bbox': 5, 'set_cost_giou': 2, 'device': 'cpu', 'seed': 42, 'start_epoch': 0, 'num_workers': 2, 'data_root_dir': 'data/', 'config_file': 'structure_config.json', 'data_type': 'structure', 'model_load_path': 'data/model/structure.pth', 'metrics_save_filepath': '', 'table_words_dir': None, 'mode': 'eval', 'debug': True, 'checkpoint_freq': 1, '__module__': '__main__', '__dict__': <attribute '__dict__' of 'Args' objects>, '__weakref__': <attribute '__weakref__' of 'Args' objects>, '__doc__': None}
loading model
loading model from checkpoint
loading data
creating index...
index created!
Test:  [0/1]  eta: 0:00:00  class_error: 0.00  loss: 0.3392 (0.3392)  loss_ce: 0.0231 (0.0231)  loss_bbox: 0.0250 (0.0250)  loss_giou: 0.2912 (0.2912)  loss_ce_unscaled: 0.0231 (0.0231)  class_error_unscaled: 0.0000 (0.0000)  loss_bbox_unscaled: 0.0050 (0.0050)  loss_giou_unscaled: 0.1456 (0.1456)  cardinality_error_unscaled: 0.0000 (0.0000)  time: 0.3716  data: 0.0614  max mem: 0
Test: Total time: 0:00:00 (0.3762 s / it)
Averaged stats: class_error: 0.00  loss: 0.3392 (0.3392)  loss_ce: 0.0231 (0.0231)  loss_bbox: 0.0250 (0.0250)  loss_giou: 0.2912 (0.2912)  loss_ce_unscaled: 0.0231 (0.0231)  class_error_unscaled: 0.0000 (0.0000)  loss_bbox_unscaled: 0.0050 (0.0050)  loss_giou_unscaled: 0.1456 (0.1456)  cardinality_error_unscaled: 0.0000 (0.0000)
Accumulating evaluation results...
DONE (t=0.01s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.619
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.750
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.629
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.619
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.281
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.506
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.638
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.638
pubmed: AP50: 0.750, AP75: 0.629, AP: 0.619, AR: 0.638

How can I visualize the model predictions on input images? like : this

bsmock commented 2 years ago

We just pushed an update to the code and documentation related to this. Please take a look at the updated documentation and how to use the --debug flag during evaluation to visualize results.

It should produce a visualization of the (overlapping) bounding boxes directly output by the model: PMC4871520_table_1_bboxes

...and of the final table cells after post-processing: PMC4871520_table_1_cells

Cheers, Brandon

mzhadigerov commented 2 years ago

Thanks @bsmock ! But how do I make inference If all I have is an image (without json file containing texts and coordinates required for evaluation) ? Basically all I want to do is to test the pre-trained table structure recognizer model on my jpg image containing a table and visualize the recognized table components (rows, headers, columns etc.).

mzhadigerov commented 2 years ago

@hannody Could you attach the original image, please? I'm getting ValueError: not supported on this line:

outputs = self.model(img_tensor)

Here is my file:

hannody commented 2 years ago

Hi @mzhadigerov, try this one please table Then compare your results with mine, I have attached the structure before and after the post-processing, but the post-processing is not complete OUT_after_pp OUT_OLD_before_pp

achillesliu commented 2 years ago

@mzhadigerov in the predict function from TableStructure class, tensor shape manipulating process has some logic errors. It runs through but leads to your ValueError: not supported error

mzhadigerov commented 2 years ago

@hannody Thanks, I was able to solve the bug. Regarding your images (before/after postprocessing): so the secod image (with red rectangles) is postprocessed? The first one looks more legit to me.

hannody commented 2 years ago

@hannody Thanks, I was able to solve the bug. Regarding your images (before/after postprocessing): so the secod image (with red rectangles) is postprocessed? The first one looks more legit to me.

My bad, sorry for the confusion, the one with only red rects is the raw one, uses the output data directly, not sure how you viewed the notebook, but this should be obvious inside the notebook, let me know if you improved upon it and good luck.

mzhadigerov commented 2 years ago

@hannody yes, that's what I thought it was according to the code in the notebook. Thanks for clarification!

sontakke12297 commented 2 years ago

@hannody Could you attach the original image, please? I'm getting ValueError: not supported on this line:

outputs = self.model(img_tensor)

Here is my file:

@mzhadigerov How did you solve this issue?