kha-white / manga-ocr

Optical character recognition for Japanese text, with the main focus being Japanese manga
Apache License 2.0
1.7k stars 89 forks source link
comics computer-vision deep-learning japanese manga ocr transformers

Manga OCR

Optical character recognition for Japanese text, with the main focus being Japanese manga. It uses a custom end-to-end model built with Transformers' Vision Encoder Decoder framework.

Manga OCR can be used as a general purpose printed Japanese OCR, but its main goal was to provide a high quality text recognition, robust against various scenarios specific to manga:

Unlike many OCR models, Manga OCR supports recognizing multi-line text in a single forward pass, so that text bubbles found in manga can be processed at once, without splitting them into lines.

See also:

Installation

You need Python 3.6 or newer. Please note, that the newest Python release might not be supported due to a PyTorch dependency, which often breaks with new Python releases and needs some time to catch up. Refer to PyTorch website for a list of supported Python versions.

Some users have reported problems with Python installed from Microsoft Store. If you see an error: ImportError: DLL load failed while importing fugashi: The specified module could not be found., try installing Python from the official site.

If you want to run with GPU, install PyTorch as described here, otherwise this step can be skipped.

Troubleshooting

Usage

Python API

from manga_ocr import MangaOcr

mocr = MangaOcr()
text = mocr('/path/to/img')

or

import PIL.Image

from manga_ocr import MangaOcr

mocr = MangaOcr()
img = PIL.Image.open('/path/to/img')
text = mocr(img)

Running in the background

Manga OCR can run in the background and process new images as they appear.

You might use a tool like ShareX or Flameshot to manually capture a region of the screen and let the OCR read it either from the system clipboard, or a specified directory. By default, Manga OCR will write recognized text to clipboard, from which it can be read by a dictionary like Yomichan.

Clipboard mode on Linux requires wl-copy for Wayland sessions or xclip for X11 sessions. You can find out which one your system needs by running echo $XDG_SESSION_TYPE in the terminal.

Your full setup for reading manga in Japanese with a dictionary might look like this:

capture region with ShareX -> write image to clipboard -> Manga OCR -> write text to clipboard -> Yomichan

https://user-images.githubusercontent.com/22717958/150238361-052b95d1-0152-485f-a441-48a957536239.mp4

When running for the first time, downloading the model (~400 MB) might take a few minutes. The OCR is ready to use after OCR ready message appears in the logs.

If manga_ocr doesn't work, you might also try replacing it with python -m manga_ocr.

Usage tips

Examples

Here are some cherry-picked examples showing the capability of the model.

image Manga OCR result
素直にあやまるしか
立川で見た〝穴〟の下の巨大な眼は:
実戦剣術も一流です
第30話重苦しい闇の奥で静かに呼吸づきながら
よかったじゃないわよ!何逃げてるのよ!!早くあいつを退治してよ!
ぎゃっ
ピンポーーン
LINK!私達7人の力でガノンの塔の結界をやぶります
ファイアパンチ
少し黙っている
わかるかな〜?
警察にも先生にも町中の人達に!!

Contact

For any inquiries, please feel free to contact me at kha-white@mail.com

Acknowledgments

This project was done with the usage of: