petermr / pyami

Semantic Reader of the Scientific Literature
Apache License 2.0
12 stars 9 forks source link

Cannot encode/decode ?private? Unicode characters #39

Open petermr opened 1 year ago

petermr commented 1 year ago

IPCC reports have characters (rows on dots) in Table of Contents that seem to be outside normal Unicode ranges (maybe private).

Example - dots in p4 of test/resources/ipcc/wg2/spm/fulltext.pdf

Have currently trapped this at a page level so pages with these characters are not written but don't break the flow. The Exception mess is simply IO_ENCODER