Closed david-sen closed 3 years ago
I tried to find an argument to send it directly to a file instead of stdout
Stream redirection as you tried it is the preferred mechanism. If outputting to stdout works and redirection doesn't, it's wholly likely an upstream (i.e. Windos) issue.
See if setting environment variable PYTHONIOENCODING=<desired_encoding>
, as mentioned in this StackOverflow question, helps: https://stackoverflow.com/questions/59779618/unicodeencodeerror-in-python3-when-redirection-is-used
Hi, thanks for the quick response!
Your solution worked: I was using powershell, so I set Env:PYTHONENCODING
to utf16, then ran pdoc with pdoc --pdf ./ | Out-File pdf.md -Encoding utf8
(yes, they had to be different, otherwise the markdown file was a mess - don't quite understand that) and got it working.
Seemingly related, PEP 540: A New UTF-8 Mode.
I'm not sure whether there is anything for us to do here. :thinking:
Yes, I had meant to close this issue as the proposed solution worked. Though I do think it would be useful to be able to write to a file, rather than relying on redirection. However, since I have this working it's not a big deal.
Well, print(foo)
is equivalent to sys.stdout.write(foo)
, so we just leverage a bit of syntax provided by the OS.
I'm curious, what's the value of sys.stdout.encoding
when the output stream is redirected and when not?
$ python3 -c 'import sys; print(sys.stdout.encoding, file=sys.stderr)'
$ python3 -c 'import sys; print(sys.stdout.encoding, file=sys.stderr)' > foo
In a clean powershell, it defaults to cp1252
. Setting $Env:PYTHONIOENCODING="utf16"
changes that.
Maybe instead of plain printing to stdout: https://github.com/pdoc3/pdoc/blob/72e41dbf91646a39bc900ca6f875e5f75660e6a4/pdoc/cli.py#L361-L363 we could encode to bytes with the correct encoding and write that:
rendered_str = pdoc._render_template('/pdf.mako', modules=modules, **kwargs)
sys.stdout.write(rendered_str.encode(sys.stdout.encoding))
Can you check whether this prevents the error?
That doesn't fix the issue, however, doing it like this worked (I no longer need to set $Env:PYTHONIOENCODING="utf16"
):
rendered_str = pdoc._render_template(
'/pdf.mako', modules=modules, **kwargs)
sys.stdout.buffer.write(rendered_str.encode('utf16'))
I don't know what effect this has on other operating systems though. This outputs to the file correctly using just ">", however, pandoc still has trouble with the encoding, so I need to use "| Out-File pdf.md -Encoding utf8" instead, but this isn't a problem with pdoc.
That doesn't fix the issue
But to clarify, the shell outputs cp1252
in both these cases (redirected and not)? :flushed: :confused:
I think I might be over my head with this one. Windos bug is most likely, why not. :smile:
Oh no, really sorry, I messed up. For some reason I forgot to mention that for normal output it's utf-8
, and only when it's redirected, cp1252
. Sorry for wasting your time!
Ah, no problem. Just didn't fully compute.
I think for the time being, setting the environment variable as you have best remains the preferred workaround. Thanks again.
Expected Behavior
When I run
pdoc --pdf ./ > pdf.md
, I expect a filepdf.md
to be created with the documentation.Actual Behavior
I receive the following error:
UnicodeEncodeError: 'charmap' codec can't encode character '\u2011' in position 1771: character maps to <undefined>
.If I run just
pdoc --pdf ./
it sends everything to stdout correctly.I tried to find an argument to send it directly to a file instead of stdout, but it seems that doesn't exist.
Steps to Reproduce
pdoc --pdf ./ > pdf.md
on Windows 10Additional info