ragavsachdeva / magi

Generate a transcript for your favourite Manga: Detect manga characters, text blocks and panels. Order panels. Cluster characters. Match texts to their speakers. Perform OCR.
277 stars 7 forks source link

Improve the recognition for English and Japanese text. #4

Closed 13582351091 closed 3 months ago

13582351091 commented 4 months ago

I found that the extraction for English text works well, but when I switch the language of the comic to Chinese or Japanese, even though the selection of dialogue boxes (highlighted in red) is correct, the ocr_results only contains strange symbols and YouTube website addresses (invalid links). How can I solve this issue?

ragavsachdeva commented 3 months ago

The built-in OCR model is exclusively trained on English data. You will need to use a Japanese OCR model e.g. this one.