microsoft / i-Code

MIT License
1.67k stars 161 forks source link

In 'Finetuninng on RVLCDIP', which one is the dataset ? #86

Open CheungZeeCn opened 1 year ago

CheungZeeCn commented 1 year ago

Finetuninng on RVLCDIP

Download RVLCDIP first and change the path For OCR, you might need to customize your code

bash scripts/finetune_rvlcdip.sh   # Finetuning on RVLCDIP

Q1. which Dataset?

        ocr_dir = os.path.join(data_args.data_dir, data_args.mpdfs_dir, 'cdip-images-full-clean-ocr021121')
        image_dir = os.path.join(data_args.data_dir, data_args.mpdfs_dir, 'cdip-images')
        label_dir = os.path.join(data_args.data_dir, data_args.rvlcdip_dir, 'labels')

and in run_rvlcdip.py the dir 'cdip-images-full-clean-ocr021121' is not found in the datasets below.

image https://paperswithcode.com/dataset/rvl-cdip

Q2. Which OCR? I have downloaded the raw rvl_cdip dataset, in order to get a cdip-images-full-clean-ocr021121 to get a performance matching the one paper listed, which OCR should I use? Is it https://learn.microsoft.com/en-us/rest/api/computervision/3.1/get-read-result/get-read-result?tabs=HTTP ?

Q3. Is it OK for rvl_cdip being used for both pretrain and finetune?

Thank you!

adda1221 commented 1 year ago

I got this issues, too. Some datasets are not provided.