Open jamesvillarrubia opened 4 months ago
I get the same failure on any Unicode character in the text. Would be nice if it could fail with a warning and continue.
I have also noticed that the same PDF with the Unicode characters works when hitting the hosted endpoint: llmsherpa_api_url = "https://readers.llmsherpa.com/api/document/developer/parseDocument?renderFormat=all"
But fails when using the latest docker image
There is some sort of encoding error with '½'
Happy to submit a PR if someone can point me in the right direction for this conversion.