johnning2333 / M2Doc

28 stars 0 forks source link

How can I determine if content_ann is loaded correctly? #7

Open RicoJYang opened 3 weeks ago

RicoJYang commented 3 weeks ago

In doc_multi_modal.py file:

    def _load_content(self, data_list):
        ##insert content annatations in each data slice
        tmp_data_list = list()
        for data in data_list:
            tmp_img_info = copy.deepcopy(data)
            filename = data['img_path'].split('/')[-1].replace('jpg', 'json')
            content_file_path = osp.join(self.ann_prefix, filename)
            if os.path.isfile(content_file_path):
                ann = json.load(open(content_file_path))
                content = ann.get("content_ann", None)
                tmp_img_info["text_bboxes"] = content['bboxes']
                tmp_img_info["texts"] = content['texts']
                tmp_img_info['text_labels'] = content['labels']
                print(tmp_img_info['texts'])

During training, the print statement prints the corresponding text. Does it mean that the training is progressing normally? I performed OCR operation on the m6doc dataset using paddleocr and converted it using ocr_anno_convert.py. Why did I only get 68.1 mAP when using dino-4scale_w_m2doc_r101_m6doc_36epoch.py training?

johnning2333 commented 1 week ago

Considering the difference between the OCR results you obtained and the ones we use, this performance drop is relatively reasonable. We encourage you to experiment on the DoclayNet dataset to avoid inconsistencies in OCR acquisition.