Running the script normally seems to work, printing out the full file.
However, if I try to pipe or Tee-Object:
python .\pdf2txt.py file.pdf > file.txt
or python .\pdf2txt.py file.pdf | Tee-Object file.txt
I get the following error (Command Prompt and PowerShell):
Traceback (most recent call last):
File "C:\Users\user\Downloads\pdfminer-env\Scripts\pdf2txt.py", line 317, in <module>
sys.exit(main())
^^^^^^
File "C:\Users\user\Downloads\pdfminer-env\Scripts\pdf2txt.py", line 311, in main
outfp = extract_text(**vars(parsed_args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\Downloads\pdfminer-env\Scripts\pdf2txt.py", line 62, in extract_text
pdfminer.high_level.extract_text_to_fp(fp, **locals())
File "C:\Users\user\Downloads\pdfminer-env\Lib\site-packages\pdfminer\high_level.py", line 132, in extract_text_to_fp
interpreter.process_page(page)
File "C:\Users\user\Downloads\pdfminer-env\Lib\site-packages\pdfminer\pdfinterp.py", line 998, in process_page
self.device.end_page(page)
File "C:\Users\user\Downloads\pdfminer-env\Lib\site-packages\pdfminer\converter.py", line 81, in end_page
self.receive_layout(self.cur_item)
File "C:\Users\user\Downloads\pdfminer-env\Lib\site-packages\pdfminer\converter.py", line 352, in receive_layout
render(ltpage)
File "C:\Users\user\Downloads\pdfminer-env\Lib\site-packages\pdfminer\converter.py", line 341, in render
render(child)
File "C:\Users\user\Downloads\pdfminer-env\Lib\site-packages\pdfminer\converter.py", line 341, in render
render(child)
File "C:\Users\user\Downloads\pdfminer-env\Lib\site-packages\pdfminer\converter.py", line 341, in render
render(child)
File "C:\Users\user\Downloads\pdfminer-env\Lib\site-packages\pdfminer\converter.py", line 343, in render
self.write_text(item.get_text())
File "C:\Users\user\Downloads\pdfminer-env\Lib\site-packages\pdfminer\converter.py", line 335, in write_text
cast(TextIO, self.outfp).write(text)
File "C:\Program Files\Python311\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\x83' in position 0: character maps to <undefined>
Running the script normally seems to work, printing out the full file.
However, if I try to pipe or Tee-Object:
python .\pdf2txt.py file.pdf > file.txt
or
python .\pdf2txt.py file.pdf | Tee-Object file.txt
I get the following error (Command Prompt and PowerShell):