open-mmlab / mmocr

OpenMMLab Text Detection, Recognition and Understanding Toolbox
https://mmocr.readthedocs.io/en/dev-1.x/
Apache License 2.0
4.27k stars 743 forks source link

Reproduce total-text emperiment by FCENet #768

Closed Xiangrui-Li closed 2 years ago

Xiangrui-Li commented 2 years ago

Abstract: I followed MMOCR Documentation reproduce the CTW1500 and Icdar 2015 emperiments, and followed the ctw1500 config create Total-text. Download Total-text imgs and annotations(.txt) from https://github.com/cs-chan/Total-Text-Dataset/tree/master/Dataset, and used tools/totaltext_convert.py convert totaltxt to icdardataset. About Fcenet_targets and textsnake_targets, i am not alter anywhere.But i found something wrong when i run train.py, whatever used or not convert annotations.

Env: ubuntu:20.04 python:3.7.11 pytorch:1.8.0 cuda:11.5 mmcv-full:1.3.16 mmdet:2.18.0

Error:

` --------------------
2022-02-05 18:53:38,576 - mmocr - INFO - workflow: [('train', 1)], max: 1500 epochs
2022-02-05 18:53:38,576 - mmocr - INFO - Checkpoints will be saved to /home/bill/Project/mmocr/fce_2626*2020 by HardDiskBackend.
2022-02-05 18:53:44,399 - mmocr - INFO - Epoch [1][5/315]   lr: 1.000e-03, eta: 6 days, 6:53:38, time: 1.150, data_time: 0.543, memory: 3931, loss_text: 2.5313, loss_center: 2.2066,  loss_reg_x: 7.5278, loss_reg_y: 4.5634, loss: 34.7722
2022-02-05 18:53:46,342 - mmocr - INFO - Epoch [1][10/315]  lr: 1.000e-03, eta: 4 days, 4:57:35, time: 0.389, data_time: 0.029, memory: 3931, loss_text: 1.7856, loss_center: 1.8217, loss_reg_x: 5.2337, loss_reg_y: 3.8988, loss: 20.8382
2022-02-05 18:53:48,251 - mmocr - INFO - Epoch [1][15/315]  lr: 1.000e-03, eta: 3 days, 12:00:22, time: 0.382, data_time: 0.025, memory: 3931, loss_text: 1.9730, loss_center: 2.1837,loss_reg_x: 7.9185, loss_reg_y: 3.2578, loss: 21.1362
Traceback (most recent call last):
  File "/home/bill/Project/mmocr/tools/train.py", line 221, in <module>
    main()
  File "/home/bill/Project/mmocr/tools/train.py", line 217, in main
    meta=meta)
  File "/home/bill/Project/mmocr/mmocr/apis/train.py", line 163, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
    for i, data_batch in enumerate(self.data_loader):
  File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
    data = self._next_data()
  File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
    return self._process_data(data)
  File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
    data.reraise()
  File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/_utils.py", line 429, in reraise
    raise self.exc_type(msg)
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/datasets/custom.py", line 195, in __getitem__
    data = self.prepare_train_img(idx)
  File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/datasets/custom.py", line 218, in prepare_train_img
    return self.pipeline(results)
  File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/datasets/pipelines/compose.py", line 41, in __call__
    data = t(data)
  File "/home/bill/Project/mmocr/mmocr/datasets/pipelines/textdet_targets/base_textdet_targets.py", line 167, in __call__
    results = self.generate_targets(results)
  File "/home/bill/Project/mmocr/mmocr/datasets/pipelines/textdet_targets/fcenet_targets.py", line 351, in generate_targets
    polygon_masks_ignore)
  File "/home/bill/Project/mmocr/mmocr/datasets/pipelines/textdet_targets/fcenet_targets.py", line 316, in generate_level_targets
    level_img_size, lv_text_polys[ind])[None]
  File "/home/bill/Project/mmocr/mmocr/datasets/pipelines/textdet_targets/fcenet_targets.py", line 69, in generate_center_region_mask
    _, _, top_line, bot_line = self.reorder_poly_edge(polygon_points)
  File "/home/bill/Project/mmocr/mmocr/datasets/pipelines/textdet_targets/textsnake_targets.py", line 179, in reorder_poly_edge
    assert points.shape[0] >= 4
AssertionError
`

Config:

`dataset_type = 'IcdarDataset'
data_root = 'tests/data/total-text-txt'

img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

train_pipeline = [
    dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
    dict(
        type='LoadTextAnnotations',
        with_bbox=True,
        with_mask=True,
        poly2mask=False),
    dict(
        type='ColorJitter',
        brightness=32.0 / 255,
        saturation=0.5,
        contrast=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='RandomScaling', size=800, scale=(3. / 4, 5. / 2)),
    dict(
        type='RandomCropFlip', crop_ratio=0.5, iter_num=1, min_area_ratio=0.2),
    dict(
        type='RandomCropPolyInstances',
        instance_key='gt_masks',
        crop_ratio=0.8,
        min_side_ratio=0.3),
    dict(
        type='RandomRotatePolyInstances',
        rotate_ratio=0.5,
        max_angle=30,
        pad_with_fixed_color=False),
    dict(type='SquareResizePad', target_size=800, pad_ratio=0.6),
    dict(type='RandomFlip', flip_ratio=0.5, direction='horizontal'),
    dict(type='Pad', size_divisor=32),
    dict(
        type='FCENetTargets',
        fourier_degree=fourier_degree,
        level_proportion_range=((0, 0.25), (0.2, 0.65), (0.55, 1.0))),
        # level_proportion_range=((0, 0.4), (0.3, 0.7), (0.6, 1.0))),
    dict(
        type='CustomFormatBundle',
        keys=['p3_maps', 'p4_maps', 'p5_maps'],
        visualize=dict(flag=False, boundary_key=None)),
    dict(type='Collect', keys=['img', 'p3_maps', 'p4_maps', 'p5_maps'])
]
test_pipeline = [
    dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1080, 736),
        flip=False,
        transforms=[
            dict(type='Resize', img_scale=(2626, 2020), keep_ratio=True), #12080,800  #1920,1080 #2022,2022 cha, 2560 1600 cha
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
data = dict(
    samples_per_gpu=4,
    workers_per_gpu=2,
    val_dataloader=dict(samples_per_gpu=1),
    test_dataloader=dict(samples_per_gpu=1),
    train=dict(
        type=dataset_type,
        ann_file=data_root + '/instances_training.json',
        img_prefix=data_root + '/imgs',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=data_root + '/instances_test.json',
        img_prefix=data_root + '/imgs',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=data_root + '/instances_test.json',
        img_prefix=data_root + '/imgs',
        pipeline=test_pipeline))
evaluation = dict(interval=1, metric='hmean-iou', save_best='auto')

# optimizer
optimizer = dict(type='SGD', lr=1e-3, momentum=0.90, weight_decay=5e-4)
optimizer_config = dict(grad_clip=None)
lr_config = dict(policy='poly', power=0.9, min_lr=1e-7, by_epoch=True)
total_epochs = 1500

checkpoint_config = dict(interval=150)
# yapf:disable
log_config = dict(
    interval=5,
    hooks=[
        dict(type='TextLoggerHook')

    ])
# yapf:enable
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]`

Annotations(Example_poly_gt_img12.txt): original annotations

`x: [[112 149 200 210 166 134]], y: [[411 358 336 358 381 422]], ornt: [u'c'], transcriptions: [u'WOODFORD']
x: [[212 262 316 307 257 217]], y: [[333 325 337 359 350 355]], ornt: [u'c'], transcriptions: [u'RESERVE']
x: [[326 385 401 377 356 315]], y: [[346 391 440 442 396 365]], ornt: [u'c'], transcriptions: [u'DISTILLERY']
x: [[199 222 245 246 230 208]], y: [[374 364 362 385 384 392]], ornt: [u'c'], transcriptions: [u'DSP']
x: [[257 286 283 253]], y: [[363 366 388 383]], ornt: [u'm'], transcriptions: [u'KY']
x: [[297 324 316 290]], y: [[370 384 401 391]], ornt: [u'm'], transcriptions: [u'52']
x: [[168 251 248 167]], y: [[473 478 497 490]], ornt: [u'm'], transcriptions: [u'BOURBON']
x: [[258 333 334 259]], y: [[479 483 503 495]], ornt: [u'm'], transcriptions: [u'WHISKEY']`

or convert annotations

`112,411,149,358,200,336,210,358,166,381,134,422,WOODFORD
212,333,262,325,316,337,307,359,257,350,217,355,RESERVE
326,346,385,391,401,440,377,442,356,396,315,365,DISTILLERY
199,374,222,364,245,362,246,385,230,384,208,392,DSP
257,363,286,366,283,388,253,383,KY
297,370,324,384,316,401,290,391,52
168,473,251,478,248,497,167,490,BOURBON
258,479,333,483,334,503,259,495,WHISKEY`

Adds: I success to reproduce ctw1500 and total-text emperiment, i remember to see that about total-text is similar to ctw1500 ,so i copy it as base config, and alter some network to improve p,r,and h-means ,but i found can not run this, i search for github of mmocr issuse about total-text, unfortunately, i cann't , so i hope some body can help me to fix it . thank your for your help sincerely. @gaotongxiao @cuhk-hbsun

gaotongxiao commented 2 years ago

As reported in #759, totaltext is known to have some 3-point annotations that cannot pass 4-point assertions. You can relax it to assert points.shape[0] >= 3 and check if everything still works.

Xiangrui-Li commented 2 years ago

As reported in #759, totaltext is known to have some 3-point annotations that cannot pass 4-point assertions. You can relax it to assert points.shape[0] >= 3 and check if everything still works.

Dear gaotongxiao: Thank your for your help,sincerely. I tried for alter assert points.shape[0] >= 3 in textsnake_targets.py, but cann't fix this problem, because it will be Traceback in mmocr/datasets/piplines/textdet_targets/textsnake_targets.py L133-141 and will print IndexError: index 3 is out of bounds for axis 0 with size 3. I think it has something wrong, the primary cause is annotations functions like point relationship,i tried to search for github of totaltxt i found that is not match, i have no idea to how to solve this.

gaotongxiao commented 2 years ago

Apparently, TextSnakeTargets is only built for polygons annotated with more than 4 points, and therefore if you want to run it triangles, you'll need to modify the algorithm for such a case.

Xiangrui-Li commented 2 years ago

Apparently, TextSnakeTargets is only built for polygons annotated with more than 4 points, and therefore if you want to run it triangles, you'll need to modify the algorithm for such a case.

Yeap, so i wanna to modify it. but now i am no idea how to change this logic here. do u have any idea of sample or direction about it?

gaotongxiao commented 2 years ago

TextSnakeTargets converts gold annotations to intermediate representations for TextSnake and the entry function is generate_targets. find_head_tails is the util that identifies the head edge and tail edge of a polygon. Each util has a docstring, so read it before taking action and make sure your input/output matches the description.

HolyCrap96 commented 2 years ago

As reported in #759, totaltext is known to have some 3-point annotations that cannot pass 4-point assertions. You can relax it to assert points.shape[0] >= 3 and check if everything still works.

Dear gaotongxiao: Thank your for your help,sincerely. I tried for alter assert points.shape[0] >= 3 in textsnake_targets.py, but cann't fix this problem, because it will be Traceback in mmocr/datasets/piplines/textdet_targets/textsnake_targets.py L133-141 and will print IndexError: index 3 is out of bounds for axis 0 with size 3. I think it has something wrong, the primary cause is annotations functions like point relationship,i tried to search for github of totaltxt i found that is not match, i have no idea to how to solve this.

You can set iscrowd = 1 for text instances that consisting of 3 points in totaltext_converter. If so, those instances will be ignored during training.

Xiangrui-Li commented 2 years ago

As reported in #759, totaltext is known to have some 3-point annotations that cannot pass 4-point assertions. You can relax it to assert points.shape[0] >= 3 and check if everything still works.

Dear gaotongxiao: Thank your for your help,sincerely. I tried for alter assert points.shape[0] >= 3 in textsnake_targets.py, but cann't fix this problem, because it will be Traceback in mmocr/datasets/piplines/textdet_targets/textsnake_targets.py L133-141 and will print IndexError: index 3 is out of bounds for axis 0 with size 3. I think it has something wrong, the primary cause is annotations functions like point relationship,i tried to search for github of totaltxt i found that is not match, i have no idea to how to solve this.

You can set iscrowd = 1 for text instances that consisting of 3 points in totaltext_converter. If so, those instances will be ignored during training.

Thank you for your help. Due to is running now, I'll be try when finished.

Xiangrui-Li commented 2 years ago

TextSnakeTargets converts gold annotations to intermediate representations for TextSnake and the entry function is generate_targets. find_head_tails is the util that identifies the head edge and tail edge of a polygon. Each util has a docstring, so read it before taking action and make sure your input/output matches the description.

I'm so sorry. I forgot to reply u message. Thank you for u help, I'll be trying it latter.

Xiangrui-Li commented 2 years ago

As reported in #759, totaltext is known to have some 3-point annotations that cannot pass 4-point assertions. You can relax it to assert points.shape[0] >= 3 and check if everything still works.

Dear gaotongxiao: Thank your for your help,sincerely. I tried for alter assert points.shape[0] >= 3 in textsnake_targets.py, but cann't fix this problem, because it will be Traceback in mmocr/datasets/piplines/textdet_targets/textsnake_targets.py L133-141 and will print IndexError: index 3 is out of bounds for axis 0 with size 3. I think it has something wrong, the primary cause is annotations functions like point relationship,i tried to search for github of totaltxt i found that is not match, i have no idea to how to solve this.

You can set iscrowd = 1 for text instances that consisting of 3 points in totaltext_converter. If so, those instances will be ignored during training.

Dear HolyCrap96, I try u method to edit iscrowd=1 in totaltext-convert.py. It is working now, but i find that still has some problems in here. 1) i lost loss_center, loss_regress_x and loss_regress_y. 2) the program has been traceback in evaluation when evaluation stage. i think this error is datatype in configs or mmocr/core/evaluation/hmen-iou.py, i set datatype=icdardataset and other parameter like configs in ctw1500 . Do u have this totaltxt configs? Can u share in here

gaotongxiao commented 2 years ago

Could you post the modified converter here? Losing these loss terms is unusual.

Xiangrui-Li commented 2 years ago

Could you post the modified converter here? Losing these loss terms is unusual.


# Copyright (c) OpenMMLab. All rights reserved.
import argparse
import glob
import os
import os.path as osp
import re

import cv2 import mmcv import numpy as np import scipy.io as scio import yaml from shapely.geometry import Polygon

from mmocr.utils import convert_annotations

def collect_files(img_dir, gt_dir, split): """Collect all images and their corresponding groundtruth files.

Args:
    img_dir(str): The image directory
    gt_dir(str): The groundtruth directory
    split(str): The split of dataset. Namely: training or test
Returns:
    files(list): The list of tuples (img_file, groundtruth_file)
"""
assert isinstance(img_dir, str)
assert img_dir
assert isinstance(gt_dir, str)
assert gt_dir

# note that we handle png and jpg only. Pls convert others such as gif to
# jpg or png offline
suffixes = ['.png', '.PNG', '.jpg', '.JPG', '.jpeg', '.JPEG']
# suffixes = ['.png']

imgs_list = []
for suffix in suffixes:
    imgs_list.extend(glob.glob(osp.join(img_dir, '*' + suffix)))

imgs_list = sorted(imgs_list)
ann_list = sorted(
    [osp.join(gt_dir, gt_file) for gt_file in os.listdir(gt_dir)])

files = list(zip(imgs_list, ann_list))
assert len(files), f'No images found in {img_dir}'
print(f'Loaded {len(files)} images from {img_dir}')

return files

def collect_annotations(files, nproc=1): """Collect the annotation information.

Args:
    files(list): The list of tuples (image_file, groundtruth_file)
    nproc(int): The number of process to collect annotations
Returns:
    images(list): The list of image information dicts
"""
assert isinstance(files, list)
assert isinstance(nproc, int)

if nproc > 1:
    images = mmcv.track_parallel_progress(
        load_img_info, files, nproc=nproc)
else:
    images = mmcv.track_progress(load_img_info, files)

return images

def get_contours_mat(gt_path): """Get the contours and words for each ground_truth mat file.

Args:
    gt_path(str): The relative path of the ground_truth mat file
Returns:
    contours(list[lists]): A list of lists of contours
    for the text instances
    words(list[list]): A list of lists of words (string)
    for the text instances
"""
assert isinstance(gt_path, str)

contours = []
words = []
data = scio.loadmat(gt_path)
data_polygt = data['polygt']

for i, lines in enumerate(data_polygt):
    X = np.array(lines[1])
    Y = np.array(lines[3])

    point_num = len(X[0])
    word = lines[4]
    if len(word) == 0:
        word = '???'
    else:
        word = word[0]

    if word == '#':
        word = '###'
        continue

    words.append(word)

    arr = np.concatenate([X, Y]).T
    contour = []
    for i in range(point_num):
        contour.append(arr[i][0])
        contour.append(arr[i][1])
    contours.append(np.asarray(contour))

return contours, words

def load_mat_info(img_info, gt_file): """Load the information of one ground truth in .mat format.

Args:
    img_info(dict): The dict of only the image information
    gt_file(str): The relative path of the ground_truth mat
    file for one image
Returns:
    img_info(dict): The dict of the img and annotation information
"""
assert isinstance(img_info, dict)
assert isinstance(gt_file, str)

contours, words = get_contours_mat(gt_file)
anno_info = []
for contour in contours:
    if contour.shape[0] == 2:
        continue
    category_id = 1
    coordinates = np.array(contour).reshape(-1, 2)
    polygon = Polygon(coordinates)
    iscrowd = 1 # 0

    area = polygon.area
    # convert to COCO style XYWH format
    min_x, min_y, max_x, max_y = polygon.bounds
    bbox = [min_x, min_y, max_x - min_x, max_y - min_y]

    anno = dict(
        iscrowd=iscrowd,
        category_id=category_id,
        bbox=bbox,
        area=area,
        segmentation=[contour])
    anno_info.append(anno)

img_info.update(anno_info=anno_info)

return img_info

def process_line(line, contours, words): """Get the contours and words by processing each line in the gt file.

Args:
    line(str): The line in gt file containing annotation info
    contours(list[lists]): A list of lists of contours
    for the text instances
    words(list[list]): A list of lists of words (string)
    for the text instances
Returns:
    contours(list[lists]): A list of lists of contours
    for the text instances
    words(list[list]): A list of lists of words (string)
    for the text instances
"""

line = '{' + line.replace('[[', '[').replace(']]', ']') + '}'
ann_dict = re.sub('([0-9]) +([0-9])', r'\1,\2', line)
ann_dict = re.sub('([0-9]) +([ 0-9])', r'\1,\2', ann_dict)
ann_dict = re.sub('([0-9]) -([0-9])', r'\1,-\2', ann_dict)
ann_dict = ann_dict.replace("[u',']", "[u'#']")
ann_dict = yaml.load(ann_dict,Loader=yaml.CLoader)

X = np.array([ann_dict['x']])
Y = np.array([ann_dict['y']])

if len(ann_dict['transcriptions']) == 0:
    word = '???'
else:
    word = ann_dict['transcriptions'][0]
    if len(ann_dict['transcriptions']) > 1:
        for ann_word in ann_dict['transcriptions'][1:]:
            word += ',' + ann_word
    word = str(eval(word))
words.append(word)

point_num = len(X[0])

arr = np.concatenate([X, Y]).T
contour = []
for i in range(point_num):
    contour.append(arr[i][0])
    contour.append(arr[i][1])
contours.append(np.asarray(contour))

return contours, words

def get_contours_txt(gt_path): """Get the contours and words for each ground_truth txt file.

Args:
    gt_path(str): The relative path of the ground_truth mat file
Returns:
    contours(list[lists]): A list of lists of contours
    for the text instances
    words(list[list]): A list of lists of words (string)
    for the text instances
"""
assert isinstance(gt_path, str)

contours = []
words = []

with open(gt_path, 'r') as f:
    tmp_line = ''
    for idx, line in enumerate(f):
        line = line.strip()
        if idx == 0:
            tmp_line = line
            continue
        if not line.startswith('x:'):
            tmp_line += ' ' + line
            continue
        else:
            complete_line = tmp_line
            tmp_line = line
        contours, words = process_line(complete_line, contours, words)

    if tmp_line != '':
        contours, words = process_line(tmp_line, contours, words)

    for word in words:

        if word == '#':
            word = '###'
            continue

return contours, words

def load_txt_info(gt_file, img_info): """Load the information of one ground truth in .txt format.

Args:
    img_info(dict): The dict of only the image information
    gt_file(str): The relative path of the ground_truth mat
    file for one image
Returns:
    img_info(dict): The dict of the img and annotation information
"""

contours, words = get_contours_txt(gt_file)
anno_info = []
for contour in contours:
    if contour.shape[0] == 2:
        continue
    category_id = 1
    coordinates = np.array(contour).reshape(-1, 2)
    polygon = Polygon(coordinates)
    iscrowd = 1 # 0

    area = polygon.area
    # convert to COCO style XYWH format
    min_x, min_y, max_x, max_y = polygon.bounds
    bbox = [min_x, min_y, max_x - min_x, max_y - min_y]

    anno = dict(
        iscrowd=iscrowd,
        category_id=category_id,
        bbox=bbox,
        area=area,
        segmentation=[contour])
    anno_info.append(anno)

img_info.update(anno_info=anno_info)

return img_info

def load_png_info(gt_file, img_info): """Load the information of one ground truth in .png format.

Args:
    gt_file(str): The relative path of the ground_truth file for one image
    img_info(dict): The dict of only the image information
Returns:
    img_info(dict): The dict of the img and annotation information
"""
assert isinstance(gt_file, str)
assert isinstance(img_info, dict)
gt_img = cv2.imread(gt_file, 0)
contours, _ = cv2.findContours(gt_img, cv2.RETR_EXTERNAL,
                               cv2.CHAIN_APPROX_SIMPLE)

anno_info = []
for contour in contours:
    if contour.shape[0] == 2:
        continue
    category_id = 1
    xy = np.array(contour).flatten().tolist()

    coordinates = np.array(contour).reshape(-1, 2)
    polygon = Polygon(coordinates)
    iscrowd = 1 # 0

    area = polygon.area
    # convert to COCO style XYWH format
    min_x, min_y, max_x, max_y = polygon.bounds
    bbox = [min_x, min_y, max_x - min_x, max_y - min_y]

    anno = dict(
        iscrowd=iscrowd,
        category_id=category_id,
        bbox=bbox,
        area=area,
        segmentation=[xy])
    anno_info.append(anno)

img_info.update(anno_info=anno_info)

return img_info

def load_img_info(files): """Load the information of one image.

Args:
    files(tuple): The tuple of (img_file, groundtruth_file)
Returns:
    img_info(dict): The dict of the img and annotation information
"""
assert isinstance(files, tuple)

img_file, gt_file = files
# read imgs with ignoring orientations
img = mmcv.imread(img_file, 'unchanged')
# read imgs with orientations as dataloader does when training and testing
img_color = mmcv.imread(img_file, 'color')
# make sure imgs have no orientation info, or annotation gt is wrong.
# assert img.shape[0:2] == img_color.shape[0:2]

split_name = osp.basename(osp.dirname(img_file))
img_info = dict(
    # remove img_prefix for filename
    file_name=osp.join(split_name, osp.basename(img_file)),
    height=img.shape[0],
    width=img.shape[1],
    # anno_info=anno_info,
    segm_file=osp.join(split_name, osp.basename(gt_file)))

if osp.splitext(gt_file)[1] == '.mat':
    img_info = load_mat_info(img_info, gt_file)
elif osp.splitext(gt_file)[1] == '.txt':
    img_info = load_txt_info(gt_file, img_info)
else:
    raise NotImplementedError

return img_info

def parse_args(): parser = argparse.ArgumentParser( description='Convert totaltext annotations to COCO format') parser.add_argument('root_path', help='totaltext root path') parser.add_argument('-o', '--out-dir', help='output path') parser.add_argument( '--split-list', nargs='+', help='a list of splits. e.g., "--split_list train test"')

parser.add_argument(
    '--nproc', default=1, type=int, help='number of process')
args = parser.parse_args()
return args

def main(): args = parse_args() root_path = args.root_path out_dir = args.out_dir if args.out_dir else root_path mmcv.mkdir_or_exist(out_dir)

img_dir = osp.join(root_path, 'imgs/')
gt_dir = osp.join(root_path, 'annotations/')

set_name = {}
for split in args.split_list:
    set_name.update({split: 'instances_' + split + '.json'})
    print(osp.join(img_dir, split))
    print(osp.exists(osp.join(img_dir, split)))
    assert osp.exists(osp.join(img_dir, split))

for split, json_name in set_name.items():
    print(f'Converting {split} into {json_name}')
    with mmcv.Timer(
            print_tmpl='It takes {}s to convert totaltext annotation'):
        files = collect_files(
            osp.join(img_dir, split), osp.join(gt_dir, split), split)
        image_infos = collect_annotations(files, nproc=args.nproc)
        convert_annotations(image_infos, osp.join(out_dir, json_name))

if name == 'main': main()



i follow the [HolyCrap96](https://github.com/HolyCrap96)  method to alter `iscrowd=1`, and running this file to get two json namly trainning and test. The program with totaltext has something wrong when training . I copy the configs from CTW1500 and alter the path with total-text. but i lost some losses. The detail in here.[https://github.com/open-mmlab/mmocr/issues/768#issuecomment-1059884739](url)
gaotongxiao commented 2 years ago

You've set iscrowd=1 for ALL annotations, and hence MMOCR skipped all of them in training. You should only set iscrowd=1 for instances consisting of 3 points only and iscrowd=0 for all other instances.

Xiangrui-Li commented 2 years ago

You've set iscrowd=1 for ALL annotations, and hence MMOCR skipped all of them in training. You should only set iscrowd=1 for instances consisting of 3 points only and iscrowd=0 for all other instances.

Oh, I misunderstand that means, I'll try to alter it when GPU free. Thanks a lot.

Xiangrui-Li commented 2 years ago

You've set iscrowd=1 for ALL annotations, and hence MMOCR skipped all of them in training. You should only set iscrowd=1 for instances consisting of 3 points only and iscrowd=0 for all other instances.

Hey bro. I fix this problem in my project. For this problem, I divided three steps. First step, u need to alter the code in total_text_converter.py. In class load_text_info, load_mat_info and load_png_info u can find the code iscrowd=0, u can follow this HolyCrap96 answer front in my issue. but this operation can't deal with the probelm, so i write something in here

iscrowd = 0
if (contour.shape[0] < 8 or contour.shape[0] % 2 != 0):
    iscrowd = 1

second step,convert total-text datasets and add it to configs. finally step, if u evaluation in fcenet, u will find has something wrong, and the traceback like out of list(detail error i forgeted...). So i alter in textsnake_targets.py. I find this error is resulted from the current_line_len >= length_cussum[current_edge_ind + 1] in class resample_line, out of list index. so i added if to check this whether this parameter meets the requirements. i used this code

if current_line_len >= 1:
    while current_line_len >= length.cumsum[current_line_len + 1]
        current_edge_ind += 1

to instead of the original code current_line_len >= length_cussum[current_edge_ind + 1] and now, congratulations, this dataset can training and evaluating.

gaotongxiao commented 2 years ago

Thanks for sharing! BTW, I did a quick check on textsnake_target.py, but couldn't find the relevant code snippet you mentioned. Is it a part of your local change?

Xiangrui-Li commented 2 years ago

Thanks for sharing! BTW, I did a quick check on textsnake_target.py, but couldn't find the relevant code snippet you mentioned. Is it a part of your local change?

I'm so sorry about it. I search my project and find my mmocr version is 0.3.0+0a521ba in the training log files. I remember that download which after sep. last year. So may be updated during this time. Due to I'm training now, so no plan to open my code or test the new version in short time. So u can debug follow this idea which wanna to test that. I'm not sure about my version is it match this version project. If u need i can share textsnake_target.py and total-text_converter.py to my github.

gaotongxiao commented 2 years ago

No problem, glad to hear that you finally made it. Let us know if you need any help lateron.

Xiangrui-Li commented 2 years ago

No problem, glad to hear that you finally made it. Let us know if you need any help lateron.

Thank you sir. i believe u can handle this problem in the next version. Good luck.