pymupdf / PyMuPDF-Utilities

Demos, examples and utilities using PyMuPDF
GNU Affero General Public License v3.0
566 stars 153 forks source link

fitzcli.py errors on characters in some documents #28

Closed AlisterH closed 3 years ago

AlisterH commented 3 years ago

Some documents such as this one cause fitzcli.py to error.


#  python3 fitzcli.py gettext G*\).pdf
Traceback (most recent call last):
  File "fitzcli.py", line 1206, in <module>
    main()
  File "fitzcli.py", line 1202, in main
    args.func(args)  # execute requested command
  File "fitzcli.py", line 901, in gettext
    flags=flags,
  File "fitzcli.py", line 866, in page_layout
    textout.write((text + "\n").encode("utf8"))
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 39-42: surrogates not allowed