johnning2333 / M2Doc

28 stars 0 forks source link

readme

[AAAI2024] M2Doc: A Multi-Modal Fusion Approach for Document Layout Analysis

The paper is available at this link.

🚧 TODO List

[x] Add training script and inference script for DINO_M2Doc.
[x] Add training script and inference script for other detectors.
[x] Add the data format samples for M2Doc.
[x] Add the dataset converting scripts.
[x] Release the Model-Zoo of M2Doc on DocLayNet.

Installation

Python=3.8.0
CUDA 10.2
transformers
MMDetection

Dataset Prepare

Download dataset you need, dataset downloading links:
Convert datasets OCR annotations.
Using ocr_anno_convert.py to format and sort dataset OCR annotations.
Three test Samples can be found in Annos.

Train and Inference Steps

Install the repository (we recommend to use Anaconda for installation.)

conda create -n m2doc python=3.8 -y
conda activate m2doc
conda install pytorch==1.8.1 torchvision==0.9.1 torchaudio==0.8.1 cudatoolkit=10.2 -c pytorch
git clone https://github.com/johnning2333/M2Doc.git
cd M2Doc/mmdetection
pip install -v -e .
pip install transformers
pip install mmengine
mim install mmcv

Train

# for multi-gpu training
bash mmdetection/tools/dist_train.sh mmdetection/m2doc_config/dino-4scale_w_m2doc_doclaynet.py 8

Inference

# for multi-gpu inference
bash mmdetection/tools/dist_test.sh mmdetection/m2doc_config/dino-4scale_w_m2doc_doclaynet.py work_dirs/dino-4scale_w_m2doc_r50_8xb2-12e_doclaynet/epoch_12.pth 8