shebinleo / pdf2html

pdf2html is a module which helps to convert PDF file to HTML pages using Apache Tika. This module also helps to generate thumbnail image for PDF file using Apache PDFBox.
https://www.npmjs.com/package/pdf2html
Apache License 2.0
154 stars 33 forks source link

Can't convert PDF to HTML in other languages than English #57

Open FarisKP1 opened 1 year ago

FarisKP1 commented 1 year ago

When I tried to convert a PDF that contained MALAYALAM language the output was not as expected.

Input PDF Content

എം എൽ എ യുടെ അധ്യക്ഷതയിൽ യോഗം

Output

���ി�ിൽ എം എൽ എ �െട ആധ��തയിൽ േയാഗം