trinker / textreadr

Tools to uniformly read in text data including semi-structured transcripts
74 stars 5 forks source link

Encoding - how to workaround? #14

Closed nujcharee closed 3 years ago

nujcharee commented 6 years ago

Hi there, I used textreadr package to read multiple pdf files and I noticed that bullet points in the pdf is encoded to <U+F020> in the output. Please could you advice what can I do to avoid this encoding issue?

Thank you so much for your advice :)

elinw commented 6 years ago

Are you on Windows?

nujcharee commented 6 years ago

yes I am

trinker commented 4 years ago

can you provide an example of the pdf?

trinker commented 3 years ago

Closing as the user never provided the error file. Feel free to re-open.