[LayoutReader] Training loss is low but inference performs terrible

Mountchicken commented 2 years ago

Describe Model I am using (UniLM, MiniLM, LayoutLM ...): LayoutReader

Hi @zlwang-cs I am using layoutreader to predict layout-only data like this, which the order is from right to left, top to bottom. 12_MTH_1

However, I encountered some problems and hope you can share some insights.

When I tried to train the model, an error occurs at line 671, said that the two summed tensor dimensions are not aligned, and when I remove the self.bias, I can train normally https://github.com/microsoft/unilm/blob/cd2eb8ade8b6e475aefa9b769ced2eefc4245a3e/layoutreader/s2s_ft/modeling.py#L669-L673
The training process is normal, and the initial loss is around 30 and can drop to 0.01. However, when I test with thetest set (shuffle rate=1.0), the results can be very poor. A lot of boxes are missing and the ARD is around 20.1. When I pre-sort the inputs in the test set with rules (simply orderd by x coordinate), and feed them into the network. But the network just predicts the exact same reading order as the inputs. What's more, when I tested with the training set, the results were also very poor

zlwang-cs commented 2 years ago

Hi, thanks for your interest in our paper. I'd love to help you to fix the issues.

For the first question, I am not sure what the problem is without any more detailed information. I would recommend you to use stop points or other debugging tools to see what the shape of tensors looks like in this step.

For the second question, I guess the problem may be from the reading order of your dataset. You can see that the data you use is quite different from the original settings in our paper. The pre-trained weight is based on the left-to-right and top-to-bottom reading order so such a pre-training setting may be an obstacle in your experiment. Considering this, I don't think the poor performance is surprising. If you would like to continue solving this reading order setting, maybe you need to collect enough data to pre-train the model again or resort to other approaches. Another possible way is to turn the image 90 degrees counterclockwise so that it will be similar to the common reading order settings.

Mountchicken commented 2 years ago

Hi @zlwang-cs Tks for the prompt reply. I'll try to debug this and see if I can locate the problem. Rotate the image 90 degrees counterclockwise seems to be quite reasonable and I'll try this too.

There is one thing that still puzzles me. You mentioned that LayoutReader loads pre-trained weights during training. Is this pre-trained weight based on word-level or textline-level? It seems that I need textline-level pre-trained weight here

zlwang-cs commented 2 years ago

Hi @Mountchicken, the pre-trained model is based on the word-level. Unfortunately, I cannot help you with the textline-level pre-training.

Mountchicken commented 2 years ago

Hi @zlwang-cs Thanks for the reply. BTW, how to load the pre-trained weights and finetune it on my own dataset? I downloaded layoutreader-base-readingbank.zip from this link and got config.json, pytorch_model.bin after unpacking it.

Which arg should I assign below

python -m torch.distributed.launch --nproc_per_node=4 run_seq2seq.py \
    --model_type layoutlm \
    --model_name_or_path layoutlm-base-uncased \
    --train_folder /path/to/ReadingBank/train \
    --output_dir /path/to/output/LayoutReader/layoutlm \
    --do_lower_case \
    --fp16 \
    --fp16_opt_level O2 \
    --max_source_seq_length 513 \
    --max_target_seq_length 511 \
    --per_gpu_train_batch_size 2 \
    --gradient_accumulation_steps 1 \
    --learning_rate 7e-5 \
    --num_warmup_steps 500 \
    --num_training_steps 75000 \
    --cache_dir /path/to/output/LayoutReader/cache \
    --label_smoothing 0.1 \
    --save_steps 5000 \
    --cached_train_features_file /path/to/ReadingBank/features_train.pt

zlwang-cs commented 2 years ago

Hi @Mountchicken , I assume weight loading is quite common in practice. Please refer to some related documents and I am sure you can find the right answer. And I see you are using the run_seq2seq.py which is for training, but the weight you downloaded is actually for decoding. I guess that is the reason why you are confused.

microsoft / unilm

[LayoutReader] Training loss is low but inference performs terrible #826