pdoc3 / pdoc

:snake: :arrow_right: :scroll: Auto-generate API documentation for Python projects
https://pdoc3.github.io/pdoc/
GNU Affero General Public License v3.0
1.12k stars 145 forks source link

UnicodeEncodeError when redirecting PDF output #303

Closed david-sen closed 3 years ago

david-sen commented 3 years ago

Expected Behavior

When I run pdoc --pdf ./ > pdf.md, I expect a file pdf.md to be created with the documentation.

Actual Behavior

I receive the following error: UnicodeEncodeError: 'charmap' codec can't encode character '\u2011' in position 1771: character maps to <undefined>.

If I run just pdoc --pdf ./ it sends everything to stdout correctly.

I tried to find an argument to send it directly to a file instead of stdout, but it seems that doesn't exist.

Steps to Reproduce

  1. Run pdoc --pdf ./ > pdf.md on Windows 10

Additional info

kernc commented 3 years ago

I tried to find an argument to send it directly to a file instead of stdout

Stream redirection as you tried it is the preferred mechanism. If outputting to stdout works and redirection doesn't, it's wholly likely an upstream (i.e. Windos) issue.

See if setting environment variable PYTHONIOENCODING=<desired_encoding>, as mentioned in this StackOverflow question, helps: https://stackoverflow.com/questions/59779618/unicodeencodeerror-in-python3-when-redirection-is-used

david-sen commented 3 years ago

Hi, thanks for the quick response!

Your solution worked: I was using powershell, so I set Env:PYTHONENCODING to utf16, then ran pdoc with pdoc --pdf ./ | Out-File pdf.md -Encoding utf8 (yes, they had to be different, otherwise the markdown file was a mess - don't quite understand that) and got it working.

kernc commented 3 years ago

Seemingly related, PEP 540: A New UTF-8 Mode.

I'm not sure whether there is anything for us to do here. :thinking:

david-sen commented 3 years ago

Yes, I had meant to close this issue as the proposed solution worked. Though I do think it would be useful to be able to write to a file, rather than relying on redirection. However, since I have this working it's not a big deal.

kernc commented 3 years ago

Well, print(foo) is equivalent to sys.stdout.write(foo), so we just leverage a bit of syntax provided by the OS.

I'm curious, what's the value of sys.stdout.encoding when the output stream is redirected and when not?

$ python3 -c 'import sys; print(sys.stdout.encoding, file=sys.stderr)'

$ python3 -c 'import sys; print(sys.stdout.encoding, file=sys.stderr)' > foo
david-sen commented 3 years ago

In a clean powershell, it defaults to cp1252. Setting $Env:PYTHONIOENCODING="utf16" changes that.

kernc commented 3 years ago

Maybe instead of plain printing to stdout: https://github.com/pdoc3/pdoc/blob/72e41dbf91646a39bc900ca6f875e5f75660e6a4/pdoc/cli.py#L361-L363 we could encode to bytes with the correct encoding and write that:

rendered_str = pdoc._render_template('/pdf.mako', modules=modules, **kwargs)
sys.stdout.write(rendered_str.encode(sys.stdout.encoding))

Can you check whether this prevents the error?

david-sen commented 3 years ago

That doesn't fix the issue, however, doing it like this worked (I no longer need to set $Env:PYTHONIOENCODING="utf16"):

rendered_str = pdoc._render_template(
    '/pdf.mako', modules=modules, **kwargs)
sys.stdout.buffer.write(rendered_str.encode('utf16'))

I don't know what effect this has on other operating systems though. This outputs to the file correctly using just ">", however, pandoc still has trouble with the encoding, so I need to use "| Out-File pdf.md -Encoding utf8" instead, but this isn't a problem with pdoc.

kernc commented 3 years ago

That doesn't fix the issue

But to clarify, the shell outputs cp1252 in both these cases (redirected and not)? :flushed: :confused:

I think I might be over my head with this one. Windos bug is most likely, why not. :smile:

david-sen commented 3 years ago

Oh no, really sorry, I messed up. For some reason I forgot to mention that for normal output it's utf-8, and only when it's redirected, cp1252. Sorry for wasting your time!

kernc commented 3 years ago

Ah, no problem. Just didn't fully compute.

I think for the time being, setting the environment variable as you have best remains the preferred workaround. Thanks again.