open-mmlab / mmocr

OpenMMLab Text Detection, Recognition and Understanding Toolbox
https://mmocr.readthedocs.io/en/dev-1.x/
Apache License 2.0
4.27k stars 743 forks source link

Error of training FCENet #863

Closed S130111 closed 2 years ago

S130111 commented 2 years ago

Hi, I encounter this error when I train FCENet following the instructions. Could you please help me solve this issue

Original Traceback (most recent call last): File "/home/software/anaconda2019/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop data = fetcher.fetch(index) File "/home/software/anaconda2019/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/software/anaconda2019/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/software/anaconda2019/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/dataset.py", line 207, in getitem return self.datasets[dataset_idx][sample_idx] File "/home/software/anaconda2019/envs/open-mmlab/lib/python3.7/site-packages/mmdet/datasets/custom.py", line 218, in getitem data = self.prepare_train_img(idx) File "/home/software/anaconda2019/envs/open-mmlab/lib/python3.7/site-packages/mmdet/datasets/custom.py", line 241, in prepare_train_img return self.pipeline(results) File "/home/software/anaconda2019/envs/open-mmlab/lib/python3.7/site-packages/mmdet/datasets/pipelines/compose.py", line 41, in call data = t(data) File "/home/root1/SHI/mmocr/mmocr/datasets/pipelines/textdet_targets/base_textdet_targets.py", line 167, in call results = self.generate_targets(results) File "/home/root1/SHI/mmocr/mmocr/datasets/pipelines/textdet_targets/fcenet_targets.py", line 351, in generate_targets polygon_masks_ignore) File "/home/root1/SHI/mmocr/mmocr/datasets/pipelines/textdet_targets/fcenet_targets.py", line 325, in generate_level_targets level_img_size, lv_text_polys[ind]) File "/home/root1/SHI/mmocr/mmocr/datasets/pipelines/textdet_targets/fcenet_targets.py", line 251, in generate_fourier_maps fourier_coeff = self.cal_fourier_signature(polygon[0], k) File "/home/root1/SHI/mmocr/mmocr/datasets/pipelines/textdet_targets/fcenet_targets.py", line 212, in cal_fourier_signature resampled_polygon = self.normalize_polygon(resampled_polygon) File "/home/root1/SHI/mmocr/mmocr/datasets/pipelines/textdet_targets/fcenet_targets.py", line 159, in normalize_polygon x = np.abs(temp_polygon[:, 0]) IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

gaotongxiao commented 2 years ago

Did you modify the config and use your own dataset for training? It seems some of your annotations (the polygons) are invalid. If that's not the case, please provide more details following the issue template to help us locate the problem.

gaotongxiao commented 2 years ago

I'm closing this issue as there have been no updates for months.

justcodew commented 2 years ago

i also come arcoss such error in fcenet. it would be error when the poly has the same points input to cal_fourier_signature. https://github.com/open-mmlab/mmocr/blob/b8f7ead74cb0200ad5c422e82724ca6b2eb1c543/mmocr/datasets/pipelines/textdet_targets/fcenet_targets.py#L243 in generate_fourier_maps function.

            poly = np.array([[59.50518,24.676094],
                            [59.50518,24.676094],
                            [59.50518,24.676094],
                            [59.50518,24.676094],
                            [59.50518,24.676094],
                            [59.50518,24.676094],
                            [59.50518,24.676094],
                            [59.50518,24.676094],
                            [59.50518,24.676094],
                            [59.50518,24.676094]])

            text_instance = [[poly[0][i], poly[0][i + 1]]
                             for i in range(0, len(poly[0]), 2)]
            mask = np.zeros((h, w), dtype=np.uint8)
            polygon = np.array(text_instance).reshape((1, -1, 2))
            cv2.fillPoly(mask, polygon.astype(np.int32), 1)
            fourier_coeff = self.cal_fourier_signature(polygon[0], k)

resampled_polygon = self.resample_polygon(polygon) woulld be blank if polygon has the every same point input, and then you will get the error 'IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed'

gaotongxiao commented 2 years ago

@justcodew Thanks for the insights! I wonder under what circumstance could such an invalid polygon be generated. Is it in fact an annotation problem?

justcodew commented 2 years ago

the annotation is right and i have drawed the label on the pic. i find this problem first in ppocr code, and then find the same performance in mmocr. and the source of the same pionts of poly comes from the img_aug. also the img_aug functions have some random factors in the code, it‘s not easy to recurrent it . each img_aug func consider separately, i am also looking dor which img_aug func causes such data.

justcodew commented 2 years ago

what's more,the resutls seems have some wrong in the postprocess. as i test the same pic with different rotation. gt_5194

gt_5194 gt_5194_1 gt_5194_2 gt_5194_3

gaotongxiao commented 2 years ago

@justcodew We recently did notice some bugs of ImgAug. You may try if the hotfix works for you:

Replace https://github.com/open-mmlab/mmocr/blob/0c8fa52b223c6b0c295af31c08615fcfd574a750/mmocr/datasets/pipelines/dbnet_transforms.py#L128-L145 with

        imgaug_polys = []
        for poly in polys:
            poly = poly.reshape(-1, 2)
            imgaug_polys.append(imgaug.Polygon(poly))
        imgaug_polys = aug.augment_polygons(
            [imgaug.PolygonsOnImage(imgaug_polys, shape=img_shape)])[0]

        new_polys = []
        for i, poly in enumerate(imgaug_polys.polygons):
            if poly.is_out_of_image(imgaug_polys.shape):
                continue
            new_poly = []
            for point in poly.clip_out_of_image(imgaug_polys.shape)[0]:
                new_poly.append(np.array(point, dtype=np.float32))
            new_poly = np.array(new_poly, dtype=np.float32).flatten()
            # Under some conditions, imgaug can generate "polygon" with only
            # two points, which is not a valid polygon.
            if len(new_poly) <= 4:
                continue
            new_polys.append(new_poly)

        return new_polys
gaotongxiao commented 2 years ago

@justcodew And could you share more details about your experiment steps so that we can reproduce the bug?

justcodew commented 2 years ago

@gaotongxiao i encounter this img_aug error in ppocr when training art data and find the bug comes from the same points poly. then i used the same points poly to test generate_fourier_maps func in mmocr to checkout whether your code could be more robust, as ppocr fcenet has referenced mmocr code, but i find it performanced the same error in generate_fourier_maps . i think it may be recurrented when training art data. you can see this issue #1104

yCobanoglu commented 1 year ago

Adding dict(type="FixInvalidPolygon", min_poly_points=4) seems to fix it. I gave an example pipeline.

train_pipeline_enhanced = [
    dict(
        type="LoadImageFromFile",
        file_client_args=dict(backend="disk"),
        color_type="color_ignore_orientation",
    ),
    dict(type="LoadOCRAnnotations", with_polygon=True, with_bbox=True, with_label=True),
    dict(type="FixInvalidPolygon", min_poly_points=4),
    dict(
        type="RandomResize", scale=(800, 800), ratio_range=(0.75, 2.5), keep_ratio=True
    ),
    dict(type="TextDetRandomCropFlip", crop_ratio=0.5, iter_num=1, min_area_ratio=0.2),
    dict(
        type="RandomApply",
        transforms=[dict(type="RandomCrop", min_side_ratio=0.3)],
        prob=0.8,
    ),
    dict(
        type="RandomApply",
        transforms=[
            dict(
                type="RandomRotate",
                max_angle=35,
                pad_with_fixed_color=True,
                use_canvas=True,
            )
        ],
        prob=0.6,
    ),
    dict(
        type="RandomChoice",
        transforms=[
            [
                {"type": "Resize", "scale": 800, "keep_ratio": True},
                {"type": "Pad", "size": (800, 800)},
            ],
            {"type": "Resize", "scale": 800, "keep_ratio": False},
        ],
        prob=[0.6, 0.4],
    ),
    dict(type="RandomFlip", prob=0.5, direction="horizontal"),
    dict(type="RandomFlip", prob=0.5, direction="vertical"),
    dict(
        type="RandomApply",
        transforms=[
            dict(type="TorchVisionWrapper", op="ElasticTransform", alpha=75.0),
        ],
        prob=1/3,
    ),
    dict(
        type="RandomApply",
        transforms=[
            dict(
                type="RandomChoice",
                transforms=[
                    dict(
                        type="TorchVisionWrapper",
                        op="RandomAdjustSharpness",
                        sharpness_factor=0,
                    ),
                    dict(
                        type="TorchVisionWrapper",
                        op="RandomAdjustSharpness",
                        sharpness_factor=60,
                    ),
                    dict(
                        type="TorchVisionWrapper",
                        op="RandomAdjustSharpness",
                        sharpness_factor=90,
                    ),
                ],
                prob=[1/3] * 3,
            ),
        ],
        prob=0.75,
    ),
    dict(
        type="TorchVisionWrapper",
        op="ColorJitter",
        brightness=0.15,
        saturation=0.5,
        contrast=0.3,
    ),
    dict(
        type="RandomApply",
        transforms=[
            dict(
                type="RandomChoice",
                transforms=[
                    dict(type="TorchVisionWrapper", op="RandomEqualize"),
                    dict(type="TorchVisionWrapper", op="RandomAutocontrast"),
                ],
                prob=[1 / 2, 1 / 2],
            ),
        ],
        prob=0.8,
    ),
    dict(type="FixInvalidPolygon", min_poly_points=4),
    dict(
        type="PackTextDetInputs",
        meta_keys=("img_path", "ori_shape", "img_shape", "scale_factor"),
    ),
]
TongkunGuan commented 11 months ago

Adding dict(type="FixInvalidPolygon", min_poly_points=4) seems to fix it. I gave an example pipeline.

train_pipeline_enhanced = [
    dict(
        type="LoadImageFromFile",
        file_client_args=dict(backend="disk"),
        color_type="color_ignore_orientation",
    ),
    dict(type="LoadOCRAnnotations", with_polygon=True, with_bbox=True, with_label=True),
    dict(type="FixInvalidPolygon", min_poly_points=4),
    dict(
        type="RandomResize", scale=(800, 800), ratio_range=(0.75, 2.5), keep_ratio=True
    ),
    dict(type="TextDetRandomCropFlip", crop_ratio=0.5, iter_num=1, min_area_ratio=0.2),
    dict(
        type="RandomApply",
        transforms=[dict(type="RandomCrop", min_side_ratio=0.3)],
        prob=0.8,
    ),
    dict(
        type="RandomApply",
        transforms=[
            dict(
                type="RandomRotate",
                max_angle=35,
                pad_with_fixed_color=True,
                use_canvas=True,
            )
        ],
        prob=0.6,
    ),
    dict(
        type="RandomChoice",
        transforms=[
            [
                {"type": "Resize", "scale": 800, "keep_ratio": True},
                {"type": "Pad", "size": (800, 800)},
            ],
            {"type": "Resize", "scale": 800, "keep_ratio": False},
        ],
        prob=[0.6, 0.4],
    ),
    dict(type="RandomFlip", prob=0.5, direction="horizontal"),
    dict(type="RandomFlip", prob=0.5, direction="vertical"),
    dict(
        type="RandomApply",
        transforms=[
            dict(type="TorchVisionWrapper", op="ElasticTransform", alpha=75.0),
        ],
        prob=1/3,
    ),
    dict(
        type="RandomApply",
        transforms=[
            dict(
                type="RandomChoice",
                transforms=[
                    dict(
                        type="TorchVisionWrapper",
                        op="RandomAdjustSharpness",
                        sharpness_factor=0,
                    ),
                    dict(
                        type="TorchVisionWrapper",
                        op="RandomAdjustSharpness",
                        sharpness_factor=60,
                    ),
                    dict(
                        type="TorchVisionWrapper",
                        op="RandomAdjustSharpness",
                        sharpness_factor=90,
                    ),
                ],
                prob=[1/3] * 3,
            ),
        ],
        prob=0.75,
    ),
    dict(
        type="TorchVisionWrapper",
        op="ColorJitter",
        brightness=0.15,
        saturation=0.5,
        contrast=0.3,
    ),
    dict(
        type="RandomApply",
        transforms=[
            dict(
                type="RandomChoice",
                transforms=[
                    dict(type="TorchVisionWrapper", op="RandomEqualize"),
                    dict(type="TorchVisionWrapper", op="RandomAutocontrast"),
                ],
                prob=[1 / 2, 1 / 2],
            ),
        ],
        prob=0.8,
    ),
    dict(type="FixInvalidPolygon", min_poly_points=4),
    dict(
        type="PackTextDetInputs",
        meta_keys=("img_path", "ori_shape", "img_shape", "scale_factor"),
    ),
]

I tried the pipeline, but the issue still exists.