ilius / pyglossary

A tool for converting dictionary files aka glossaries. Mainly to help use our offline glossaries in any Open Source dictionary we like on any modern operating system / device.
GNU General Public License v3.0
2.18k stars 238 forks source link

possibility for special formatting in dictionaries? #573

Open sricochet opened 2 months ago

sricochet commented 2 months ago

I've downloaded dictionaries for my kindle, and I've noticed a few have special formatting like bold text and italics.

Going forward I'd like to make my own dictionary from a scanned pdf, using an ocr software like finereader or gimagereader. I'm having difficulty however creating the aforementioned special text.

What I'm doing is creating a csv with italics and trying to convert the csv to, say, mobi for kindle. However the italics are not displaying.

Is there a way to take formatted text or an html of the text and convert it to mobi with pyglossary? I've read earlier that one could make a dictfile [https://github.com/ilius/pyglossary/issues/356] but the instructions on the page are vague and I tried making a sample and it did not work. The dictionary I'm trying to make is over a thousand pages, so I think that if there's a specific process to make the word list from the ocr the process would be possible.

It makes a slight difference in the final product so it's something I can live with, but the experimenting I've done with pyglossary has yielded dictionaries with no formatting.

If anyone has had experience with this issue, perhaps one could point me in the direction of some documentation that could help me with.

Thank you if you provide some insight.

sricochet commented 1 month ago

I think the above mentioned issue (356) addressed the problem I'm having. I will convert the pdf to html, then make a dict file (.df) from the html code generated, then convert the .df file to kindle with pyglossary. Hopefully this will work out, but I will test it before I close this issue