tyxsspa / AnyText

Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
Apache License 2.0
4.05k stars 268 forks source link

Too many wrong positions are detected #112

Open wendashi opened 2 weeks ago

wendashi commented 2 weeks ago

When I try to use 'text-editing' mode, as following:

# 2. text editing mode = 'text-editing' input_data = { "prompt": 'The text is centered on a purple background and is slightly curved upwards, giving it a dynamic and eye-catching appearance. The color purple is often associated with royalty, luxury, and power, which aligns well with the theme of the "ong" logo.', "seed": 8943410, "draw_pos": 'example_images/test/edit.png', # 'example_images/edit1.png' "ori_image": 'example_images/test/ref.png' # 'example_images/ref1.jpg' } results, rtn_code, rtn_warning, debug_info = pipe(input_data, mode=mode, **params) ...

edit.png: image

ref.png: image

It turns out really wired, because 'Warning: found 137 positions that > needed 1 from prompt.' image

So I think it maybe some settings in the self.separate_pos_imgs(in file /path/.cache/modelscope/modelscope_modules/cv_anytext_text_generation_editing/ms_wrapper.py) are not very suitable?

image image

Too many wrong positions are detected, do you have any suggestion to solve this problem? THX a lot šŸ™ image

tyxsspa commented 2 weeks ago

Pixels with a gray value = 0 are used as a mask, so you need to input the draw_pos image where the background pixels have gray values between 1 and 255. In the demo, we incorporated this processing logic (see ms_wrapper.py), but it was not added in inference.py as it was not feasible.

wendashi commented 2 weeks ago

U are right, I think some pixels of the background have gray value = 0, so I add a rule (two line code) to filter out small pixel connected components.

   def separate_pos_imgs(self, img, sort_priority, gap=102):
        num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(img)
        components = []
        min_area = 15 # Filter out small pixel connected components
        for label in range(1, num_labels):
            if stats[label, cv2.CC_STAT_AREA] >= min_area: # Filter out small pixel connected components
                component = np.zeros_like(img)
                component[labels == label] = 255
                components.append((component, centroids[label]))
        if sort_priority == 'ā†•':
            fir, sec = 1, 0  # top-down first
        elif sort_priority == 'ā†”':
            fir, sec = 0, 1  # left-right first
        components.sort(key=lambda c: (c[1][fir]//gap, c[1][sec]//gap))
        sorted_components = [c[0] for c in components]
        return sorted_components

And it works, no warning again.

Running DDIM Sampling with 20 timesteps
DDIM Sampler: 100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 20/20 [00:01<00:00, 13.16it/s]
Prompt: The text is centered on a purple background and is slightly curved upwards, giving it a dynamic and eye-catching appearance. The color purple is often associated with royalty, luxury, and power, which aligns well with the theme of the  "ong"  logo.
Done, result images are saved in: SaveImages

But the result still seems not pretty good, maybe it's because the trained data distribution, right?

image

tyxsspa commented 2 weeks ago

probably yes