Open sricochet opened 2 months ago
I think the above mentioned issue (356) addressed the problem I'm having. I will convert the pdf to html, then make a dict file (.df) from the html code generated, then convert the .df file to kindle with pyglossary. Hopefully this will work out, but I will test it before I close this issue
I've downloaded dictionaries for my kindle, and I've noticed a few have special formatting like bold text and italics.
Going forward I'd like to make my own dictionary from a scanned pdf, using an ocr software like finereader or gimagereader. I'm having difficulty however creating the aforementioned special text.
What I'm doing is creating a csv with italics and trying to convert the csv to, say, mobi for kindle. However the italics are not displaying.
Is there a way to take formatted text or an html of the text and convert it to mobi with pyglossary? I've read earlier that one could make a dictfile [https://github.com/ilius/pyglossary/issues/356] but the instructions on the page are vague and I tried making a sample and it did not work. The dictionary I'm trying to make is over a thousand pages, so I think that if there's a specific process to make the word list from the ocr the process would be possible.
It makes a slight difference in the final product so it's something I can live with, but the experimenting I've done with pyglossary has yielded dictionaries with no formatting.
If anyone has had experience with this issue, perhaps one could point me in the direction of some documentation that could help me with.
Thank you if you provide some insight.