microsoft / table-transformer

Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
MIT License
2.02k stars 233 forks source link

Stuck trying to run main.py. All help gratefully accepted... #125

Open isarker opened 1 year ago

isarker commented 1 year ago

Hello all:

I have been trying to use the Microsoft Table Transformer (as exists in Github) to detect and extract (Tables and Cells) in TIFF files along with the text that exists inside the cells. For that, we have used a machine that has the following config:

  1. HP Laptop running WSL2
  2. Docker

Using the Docker (and through using a Dockerfile, etc), we have created a container where we have setup the Table Transformer. When we run the Inference.py program, we get cells. spanning cells, tables detected BUT many of them are erroneous. Sharing an input file and detected cells here.

Aluminium armoured cable - AMNS SMP

Aluminium armoured cable - AMNS SMP

Then we explored the need to train the model for the types of tables that we shall be encountering. For that we started using the main.py with the following command.

python main.py --data_type detection --config_file detection_config.json --data_root_dir "/app/Programmer_Content/Input Image Files" --device cpu

We are encountering the following error.


{'lr': 5e-05, 'lr_backbone': 1e-05, 'batch_size': 2, 'weight_decay': 0.0001, 'epochs': 20, 'lr_drop': 1, 'lr_gamma': 0.9, 'clip_max_norm': 0.1, 'backbone': 'resnet18', 'num_classes': 2, 'dilation': False, 'position_embedding': 'sine', 'emphasized_weights': {}, 'enc_layers': 6, 'dec_layers': 6, 'dim_feedforward': 2048, 'hidden_dim': 256, 'dropout': 0.1, 'nheads': 8, 'num_queries': 15, 'pre_norm': True, 'masks': False, 'aux_loss': False, 'mask_loss_coef': 1, 'dice_loss_coef': 1, 'ce_loss_coef': 1, 'bbox_loss_coef': 5, 'giou_loss_coef': 2, 'eos_coef': 0.4, 'set_cost_class': 1, 'set_cost_bbox': 5, 'set_cost_giou': 2, 'device': 'cpu', 'seed': 42, 'start_epoch': 0, 'num_workers': 1, 'data_root_dir': '/app/Programmer_Content/Input Image Files', 'config_file': 'detection_config.json', 'data_type': 'detection', 'model_load_path': None, 'load_weights_only': False, 'model_save_dir': None, 'metrics_save_filepath': '', 'debug_save_dir': 'debug', 'table_words_dir': None, 'mode': 'train', 'debug': False, 'checkpoint_freq': 1, 'train_max_size': None, 'val_max_size': None, 'test_max_size': None, 'eval_pool_size': 1, 'eval_step': 1, '__module__': '__main__', '__dict__': <attribute '__dict__' of 'Args' objects>, '__weakref__': <attribute '__weakref__' of 'Args' objects>, '__doc__': None}
----------------------------------------------------------------------------------------------------
loading model
/root/miniconda3/envs/tables-detr/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/root/miniconda3/envs/tables-detr/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet18_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet18_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
loading data
loading data
creating index...
index created!
Traceback (most recent call last):
  File "/app/Programmer_Content/table-transformer/src/main.py", line 375, in <module>
    main()
  File "/app/Programmer_Content/table-transformer/src/main.py", line 368, in main
    train(args, model, criterion, postprocessors, device)
  File "/app/Programmer_Content/table-transformer/src/main.py", line 214, in train
    data_loader_train, data_loader_val, dataset_val, train_len = get_data(args)
  File "/app/Programmer_Content/table-transformer/src/main.py", line 130, in get_data
    sampler_train = torch.utils.data.RandomSampler(dataset_train)
  File "/root/miniconda3/envs/tables-detr/lib/python3.10/site-packages/torch/utils/data/sampler.py", line 107, in __init__
    raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0

We have run out of ideas how to get the main.py rogram to work. Following is the directory structure that we have:

/app/Programmer_Content/table-transformer/src (This is where we run the main.py program)
/app/Programmer_Content/Input Image Files/images (Has 10 JPG files)
/app/Programmer_Content/Input Image Files/train (Directory is empty)
/app/Programmer_Content/Input Image Files/val (Directory is empty)

If someone can guide us as to how to either get the inference.py to recognize more accurately OR main.py to train using the directory structure that I have mentioned, we shall be grateful.

Also, I would like to mention that if the documentation were to become simpler and richer, it will be helpful to a vastly larger number of people. (This is a suggestion). The current documentaion (as in to do steps for errors encountered) is possibly inadequate for people like me ( 4 weeks back, I was a ZERO at Python programming, AI, Github , etc. :) )

bsmock commented 11 months ago

Hi,

First let me say I understand the ask for better documentation for a broader audience. This repository has been intended mostly for other ML researchers, to allow others to reproduce our research. We rely on our research papers to be a primary source of documentation and assume our users will have read them.

The error you're seeing in main.py is because the train folder is empty. There need to be annotations in XML format in that folder.

The performance you're seeing at inference is due to two things.

  1. The structure recognition model is intended to operate on cropped table images. If you include the surrounding context it can confuse the model because it wasn't trained on images like that. This is clear from our papers but I can definitely understand the confusion if you're new to this problem.
  2. The model trained only on PubTables-1M works well on scientific tables. The top half of your example looks like a standard table, with a header the model might recognize. But the model hasn't seen tables with both the top half and bottom half together like in your example. It can work on tables like yours but would probably need a good deal of training on additional data. It helps to be familiar with the PubTables-1M dataset to understand the kinds of tables the current model trained on it will be able to handle.

Hope this is helpful.

Best, Brandon