Open Mountchicken opened 2 years ago
Hi, thanks for your interest in our paper. I'd love to help you to fix the issues.
For the first question, I am not sure what the problem is without any more detailed information. I would recommend you to use stop points or other debugging tools to see what the shape of tensors looks like in this step.
For the second question, I guess the problem may be from the reading order of your dataset. You can see that the data you use is quite different from the original settings in our paper. The pre-trained weight is based on the left-to-right and top-to-bottom reading order so such a pre-training setting may be an obstacle in your experiment. Considering this, I don't think the poor performance is surprising. If you would like to continue solving this reading order setting, maybe you need to collect enough data to pre-train the model again or resort to other approaches. Another possible way is to turn the image 90 degrees counterclockwise so that it will be similar to the common reading order settings.
Hi @zlwang-cs Tks for the prompt reply. I'll try to debug this and see if I can locate the problem. Rotate the image 90 degrees counterclockwise seems to be quite reasonable and I'll try this too.
There is one thing that still puzzles me. You mentioned that LayoutReader
loads pre-trained weights during training. Is this pre-trained weight based on word-level
or textline-level
? It seems that I need textline-level
pre-trained weight here
Hi @Mountchicken, the pre-trained model is based on the word-level. Unfortunately, I cannot help you with the textline-level pre-training.
Hi @zlwang-cs
Thanks for the reply. BTW, how to load the pre-trained weights and finetune it on my own dataset? I downloaded layoutreader-base-readingbank.zip
from this link and got config.json
, pytorch_model.bin
after unpacking it.
Which arg should I assign below
python -m torch.distributed.launch --nproc_per_node=4 run_seq2seq.py \
--model_type layoutlm \
--model_name_or_path layoutlm-base-uncased \
--train_folder /path/to/ReadingBank/train \
--output_dir /path/to/output/LayoutReader/layoutlm \
--do_lower_case \
--fp16 \
--fp16_opt_level O2 \
--max_source_seq_length 513 \
--max_target_seq_length 511 \
--per_gpu_train_batch_size 2 \
--gradient_accumulation_steps 1 \
--learning_rate 7e-5 \
--num_warmup_steps 500 \
--num_training_steps 75000 \
--cache_dir /path/to/output/LayoutReader/cache \
--label_smoothing 0.1 \
--save_steps 5000 \
--cached_train_features_file /path/to/ReadingBank/features_train.pt
Hi @Mountchicken , I assume weight loading is quite common in practice. Please refer to some related documents and I am sure you can find the right answer. And I see you are using the run_seq2seq.py
which is for training, but the weight you downloaded is actually for decoding. I guess that is the reason why you are confused.
Describe Model I am using (UniLM, MiniLM, LayoutLM ...): LayoutReader
Hi @zlwang-cs I am using
layoutreader
to predict layout-only data like this, which the order is from right to left, top to bottom.However, I encountered some problems and hope you can share some insights.
line 671
, said that the two summed tensor dimensions are not aligned, and when I remove theself.bias
, I can train normally https://github.com/microsoft/unilm/blob/cd2eb8ade8b6e475aefa9b769ced2eefc4245a3e/layoutreader/s2s_ft/modeling.py#L669-L673