Open wendashi opened 2 weeks ago
Pixels with a gray value = 0 are used as a mask, so you need to input the draw_pos
image where the background pixels have gray values between 1 and 255. In the demo, we incorporated this processing logic (see ms_wrapper.py), but it was not added in inference.py as it was not feasible.
U are right, I think some pixels of the background have gray value = 0, so I add a rule (two line code) to filter out small pixel connected components.
def separate_pos_imgs(self, img, sort_priority, gap=102):
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(img)
components = []
min_area = 15 # Filter out small pixel connected components
for label in range(1, num_labels):
if stats[label, cv2.CC_STAT_AREA] >= min_area: # Filter out small pixel connected components
component = np.zeros_like(img)
component[labels == label] = 255
components.append((component, centroids[label]))
if sort_priority == 'ā':
fir, sec = 1, 0 # top-down first
elif sort_priority == 'ā':
fir, sec = 0, 1 # left-right first
components.sort(key=lambda c: (c[1][fir]//gap, c[1][sec]//gap))
sorted_components = [c[0] for c in components]
return sorted_components
And it works, no warning again.
Running DDIM Sampling with 20 timesteps
DDIM Sampler: 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 20/20 [00:01<00:00, 13.16it/s]
Prompt: The text is centered on a purple background and is slightly curved upwards, giving it a dynamic and eye-catching appearance. The color purple is often associated with royalty, luxury, and power, which aligns well with the theme of the "ong" logo.
Done, result images are saved in: SaveImages
But the result still seems not pretty good, maybe it's because the trained data distribution, right?
probably yes
When I try to use 'text-editing' mode, as following:
# 2. text editing mode = 'text-editing' input_data = { "prompt": 'The text is centered on a purple background and is slightly curved upwards, giving it a dynamic and eye-catching appearance. The color purple is often associated with royalty, luxury, and power, which aligns well with the theme of the "ong" logo.', "seed": 8943410, "draw_pos": 'example_images/test/edit.png', # 'example_images/edit1.png' "ori_image": 'example_images/test/ref.png' # 'example_images/ref1.jpg' } results, rtn_code, rtn_warning, debug_info = pipe(input_data, mode=mode, **params) ...
edit.png:![image](https://github.com/tyxsspa/AnyText/assets/141135497/0a803114-8c0f-4a12-85f3-a586be6de705)
ref.png:![image](https://github.com/tyxsspa/AnyText/assets/141135497/4adc98a0-f95d-4444-a5d4-0e612d1599c3)
It turns out really wired, because '![image](https://github.com/tyxsspa/AnyText/assets/141135497/1985e02e-f342-4754-8c9a-5af42ca66e46)
Warning: found 137 positions that > needed 1 from prompt.
'So I think it maybe some settings in the
self.separate_pos_imgs
(in file /path/.cache/modelscope/modelscope_modules/cv_anytext_text_generation_editing/ms_wrapper.py) are not very suitable?Too many wrong positions are detected, do you have any suggestion to solve this problem? THX a lot š![image](https://github.com/tyxsspa/AnyText/assets/141135497/c3dcd93b-de07-435e-a569-b8656970a6c3)