johnning2333 / M2Doc

28 stars 0 forks source link

[AAAI2024] M2Doc: A Multi-Modal Fusion Approach for Document Layout Analysis

The paper is available at this link.

🚧 TODO List

Installation

Dataset Prepare

  1. Download dataset you need, dataset downloading links:
  2. Convert datasets OCR annotations.
    Using ocr_anno_convert.py to format and sort dataset OCR annotations.
    Three test Samples can be found in Annos.

Train and Inference Steps

  1. Install the repository (we recommend to use Anaconda for installation.)

    conda create -n m2doc python=3.8 -y
    conda activate m2doc
    conda install pytorch==1.8.1 torchvision==0.9.1 torchaudio==0.8.1 cudatoolkit=10.2 -c pytorch
    git clone https://github.com/johnning2333/M2Doc.git
    cd M2Doc/mmdetection
    pip install -v -e .
    pip install transformers
    pip install mmengine
    mim install mmcv
  2. Train

    # for multi-gpu training
    bash mmdetection/tools/dist_train.sh mmdetection/m2doc_config/dino-4scale_w_m2doc_doclaynet.py 8
  3. Inference

    # for multi-gpu inference
    bash mmdetection/tools/dist_test.sh mmdetection/m2doc_config/dino-4scale_w_m2doc_doclaynet.py work_dirs/dino-4scale_w_m2doc_r50_8xb2-12e_doclaynet/epoch_12.pth 8