rakutentech / Document-understanding

Apache License 2.0
6 stars 1 forks source link

Multi-scale Cell-based Layout Representation for Document Understanding

Model

We use models from transformers. https://huggingface.co/models


Dataset


Enviroment

Container1

will be released soon.

Anaconda env


Results

Backbone Model Size Dataset F1
LayoutLMv3 BASE FUNSD 93.76
LayoutLMv3 LARGE FUNSD 93.52
LayoutLMv3 BASE CORD 97.23
LayoutLMv3 LARGE CORD 94.49

Commands

Named-entity Recognition (NER)

LayoutLMv1

python -m torch.distributed.launch --nproc_per_node=1 
--master_port 44398 
examples/run_funsd.py         
--model_name_or_path microsoft/layoutlm-base-uncased        
--output_dir output/         
--do_train         
--do_predict         
--max_steps 5000         
--warmup_ratio 0.1         
--fp16   
--per_device_train_batch_size 4

LayoutLMv2

python -m torch.distributed.launch --nproc_per_node=1 
--master_port 24398 
examples/run_funsd.py         
--model_name_or_path microsoft/layoutlmv2-large-uncased         
--output_dir output/    
--do_train         
--do_predict         
--max_steps 2000         
--warmup_ratio 0.1         
--fp16   
--overwrite_output_dir   
--per_device_train_batch_size 4

LayoutLMv3

python -m torch.distributed.launch   
--nproc_per_node=1 
--master_port 4398 
examples/run_funsd_cord.py   
--dataset_name [funsd or cord]  
--do_train 
--do_eval   
--model_name_or_path microsoft/layoutlmv3-base   
--output_dir output/
--segment_level_layout 1 
--visual_embed 1 
--input_size 224  
--max_steps 1000 
--save_steps -1 
--evaluation_strategy steps 
--eval_steps 1000   
--learning_rate 1e-5 
--per_device_train_batch_size 8 
--gradient_accumulation_steps 1   
--dataloader_num_workers 1   
--overwrite_output_dir 

Document Classification

LayoutLMv1

python run_classification.py  
--data_dir  [datasetPath]   
--model_type layoutlm                               
--model_name_or_path ~/dev/Models/LayoutLM/layoutlm-base-uncased   
--output_dir output/
--do_lower_case 
--max_seq_length 512  
--do_train 
--do_eval 
--num_train_epochs 40.0 
--logging_steps 5000 
--save_steps 5000 
--per_gpu_train_batch_size 16 
--per_gpu_eval_batch_size 16 
--evaluate_during_training 
--fp16 --data_level 1

LayoutLMv3

python run_classification.py  
--data_dir  [datasetPath]                              
--model_type v3                               
--model_name_or_path microsoft/layoutlmv3-base                                                            
--do_lower_case                               
--max_seq_length 512                               
--do_train                                                            
--num_train_epochs 40.0                               
--logging_steps 5000                               
--save_steps 5000                               
--per_gpu_train_batch_size 2                               
--per_gpu_eval_batch_size 2                               
--evaluate_during_training

Reference

We use some codes from https://github.com/microsoft/unilm.