zyddnys / manga-image-translator

Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/
https://cotrans.touhou.ai/
GNU General Public License v3.0
5.28k stars 546 forks source link

Result not very satisfying #76

Closed binghui1992 closed 2 years ago

binghui1992 commented 2 years ago

hi, I want to translate an image (screenshot of a game) from Korean to Chinese. Here is the original image URL: https://imgur.com/vBAjVVF and the corresponding result image URL: https://imgur.com/kBipYEC. It seems that the Korean words are not well segmented at all, some words are not identified thus not translated. The CL arguments used: python translate_demo.py --verbose --translator=baidu --target-lang=CHS --image ./demo/test2.jpg.

If it's the bad argument's cause, I'd be very happy to know the good one, thanks!

dmMaze commented 2 years ago

It seems the default detection model failed to detect text bboxes.

Default

bbox_default

CTD

bboxes

To change the detection model, run with --use-ctd

binghui1992 commented 2 years ago

@dmMaze Sorry but even run with "--use-ctd" does not work, and there comes an exception with the traceback as follows:

Traceback (most recent call last):
  File "/home/boy-afei/Github/manga-image-translator/translate_demo.py", line 355, in <module>
    loop.run_until_complete(main(args.mode))
  File "/usr/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
    return future.result()
  File "/home/boy-afei/Github/manga-image-translator/translate_demo.py", line 271, in main
    await infer(img, mode, '', alpha_ch = alpha_ch)
  File "/home/boy-afei/Github/manga-image-translator/translate_demo.py", line 103, in infer
    mask, final_mask, textlines = await dispatch_ctd_detection(img, args.use_cuda)
  File "/home/boy-afei/Github/manga-image-translator/textblockdetector/__init__.py", line 135, in dispatch
    return DEFAULT_MODEL(img, refine_mode=REFINEMASK_INPAINT, keep_undetected_mask=False, bgr2rgb=False)
  File "/home/boy-afei/Github/manga-image-translator/venv/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/boy-afei/Github/manga-image-translator/textblockdetector/__init__.py", line 105, in __call__
    mask = cv2.resize(mask, (im_w, im_h), interpolation=cv2.INTER_LINEAR)
cv2.error: OpenCV(4.6.0) /io/opencv/modules/imgproc/src/resize.cpp:4052: error: (-215:Assertion failed) !ssize.empty() in function 'resize'

Atfer debuging, the line just before the exception occured (i.e. mask = mask[: mask.shape[0]-dh, : mask.shape[1]-dw]) show the shape of the mask is (2, 1024, 1024), then the resulting mask is empty, that's why the exception above happens. I've got no idea how to solve this, BYW I'm a newbie at computer vision.

dmMaze commented 2 years ago
cv2.error: OpenCV(4.6.0) /io/opencv/modules/imgproc/src/resize.cpp:4052: error: (-215:Assertion failed) !ssize.empty() in function 'resize'

The newest OpenCV(4.6.0) is not compatible with ctd, you can install a older version:
pip install opencv-python==4.5.*

binghui1992 commented 2 years ago

@dmMaze Thanks it does a great help. After downgrade opencv-python to 4.5.5.64, it works! After some attempts, it seems it works fine with version >=4.5.2, but 4.5.1.48 still not work.