Open jorisroovers opened 4 years ago
I was testing gitlint and thought I'd check this bug given it's easy to reproduce. Here's what I found out (based on this stackoverflow question):
minimal code to reproduce on py38 using echo "WIP: foöbar" | python read_from_stdin.py
if __name__ == "__main__":
import sys
input_data = sys.stdin.read()
print(f"{sys.stdin=}")
print(f"{input_data=}")
>>> sys.stdin=<_io.TextIOWrapper name='<stdin>' mode='r' encoding='cp1252'>
>>> input_data='WIP: fo?bar\n'
forcing stdin encoding to utf8. Issue still there
if __name__ == "__main__":
import sys
sys.stdin.reconfigure(encoding="utf8")
input_data = sys.stdin.read()
print(f"{sys.stdin=}")
print(f"{input_data=}")
>>> sys.stdin=<_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf8'>
>>> input_data='WIP: fo?bar\n'
forcing stdin encoding to utf8 and running $OutputEncoding = [Console]::OutputEncoding = (new-object System.Text.UTF8Encoding $false)
before echo "WIP: foöbar" | python read_from_stdin.py
. Everything looks good.
if __name__ == "__main__":
import sys
sys.stdin.reconfigure(encoding="utf8")
input_data = sys.stdin.read()
print(f"{sys.stdin=}")
print(f"{input_data=}")
>>> sys.stdin=<_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf8'>
>>> input_data='WIP: foöbar\n'
fyi, running $OutputEncoding = [Console]::OutputEncoding = (new-object System.Text.UTF8Encoding $false)
before echo "WIP: foöbar" | python read_from_stdin.py
but without sys.stdin.reconfigure(encoding="utf8")
if __name__ == "__main__":
import sys
input_data = sys.stdin.read()
print(f"{sys.stdin=}")
print(f"{input_data=}")
>>> sys.stdin=<_io.TextIOWrapper name='<stdin>' mode='r' encoding='cp1252'>
>>> input_data='WIP: foöbar\n'
seems like there is some value in forcing stdin encoding to utf8, but this is ultimately a powershell problem like you expected. So, I think you can close the issue.
Thanks for doing this extra legwork! I'll keep this open for next time I get around to digging into Unicode issues on windows :-)
When passing unicode characters to gitlint via stdin in Powershell, gitlint will not properly print out the unicode characters.
This does work as expected in the regular Windows Command Prompt, so this seems related to Powershell specifically.