matt-m-o / YomiNinja

Open-source OCR and dictionary tool.
GNU General Public License v3.0
277 stars 6 forks source link

0.6.4 When creating Yomitan Anki cards, surrounding text lines are included as well #26

Open CyanEbi opened 7 months ago

CyanEbi commented 7 months ago

First off I'm very grateful for the native Yomitan support - it's a complete game changer for me!

There's a small issue where, when I create an Anki card through Yomitan, the Yomitan {sentence} parameter contains not only the target line of text but most of the surrounding lines of text as well (see images).

I'm guessing that YominNinja passes along all lines of text to Yomitan rather than only the target line of text? And I'm guessing that the reason that nothing before カイト's lines are included is that Yomitan itself splits sentences at punctuation (in this case the "?" in "ん......?").

2024-04-08_20-19 2024-04-08_20-20

chrispavlopoulos commented 5 months ago

I'm seeing the same issue. This, along with the ability to screenshot, would completely circumvent the need for native Anki support for now, since we could create the cards directly through Yomitan.

matt-m-o commented 3 months ago

Thank you for reporting this issue! An option to fix it has been added in v0.8 and will be available soon.

@chrispavlopoulos: The Yomitan screenshot feature doesn’t work because the overlay is transparent. To fix this, I’ll add a background image.

CyanEbi commented 3 months ago

Hey, thanks for looking into the issue! I have v0.8 now and assume that it's the "Add end-of-sentence punctuation" option you're referring to.

It made me wonder for a while why you would want to append punctuation to every line instead of just only feeding Yomitan one line of extracted text (which was the solution I was expecting). I've come to realize that the problem with the solution that I expected is that it would make Yomitan unable to properly parse words that wrap unto the next line. So when looking at Paddle Text Detector (which seemingly creates a separate box for every line of text) my request is a bit absurd since it would be difficult to know which lines belong to the same piece of dialogue.

However the new Comic Text Detector seems to be good at detecting which lines belong to the same dialogue, so I was wondering if it wouldn't make sense (when using Comic Text Detector) to only feed Yomitan one box of extracted text instead of appending punctuation to every box of extracted text?