jorisroovers / gitlint

Linting for your git commit messages
http://jorisroovers.github.io/gitlint
MIT License
786 stars 99 forks source link

Unicode issue on Powershell #95

Open jorisroovers opened 4 years ago

jorisroovers commented 4 years ago

When passing unicode characters to gitlint via stdin in Powershell, gitlint will not properly print out the unicode characters.

echo "WIP: foöbar" | gitlint
1: T5 Title contains the word 'WIP' (case-insensitive): "WIP: fo?bar"
3: B6 Body message is missing

This does work as expected in the regular Windows Command Prompt, so this seems related to Powershell specifically.

ghazi-git commented 1 year ago

I was testing gitlint and thought I'd check this bug given it's easy to reproduce. Here's what I found out (based on this stackoverflow question):

minimal code to reproduce on py38 using echo "WIP: foöbar" | python read_from_stdin.py

if __name__ == "__main__":
    import sys
    input_data = sys.stdin.read()
    print(f"{sys.stdin=}")
    print(f"{input_data=}")

>>> sys.stdin=<_io.TextIOWrapper name='<stdin>' mode='r' encoding='cp1252'>
>>> input_data='WIP: fo?bar\n'

forcing stdin encoding to utf8. Issue still there

if __name__ == "__main__":
    import sys
    sys.stdin.reconfigure(encoding="utf8")
    input_data = sys.stdin.read()
    print(f"{sys.stdin=}")
    print(f"{input_data=}")

>>> sys.stdin=<_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf8'>
>>> input_data='WIP: fo?bar\n'

forcing stdin encoding to utf8 and running $OutputEncoding = [Console]::OutputEncoding = (new-object System.Text.UTF8Encoding $false) before echo "WIP: foöbar" | python read_from_stdin.py. Everything looks good.

if __name__ == "__main__":
    import sys
    sys.stdin.reconfigure(encoding="utf8")
    input_data = sys.stdin.read()
    print(f"{sys.stdin=}")
    print(f"{input_data=}")

>>> sys.stdin=<_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf8'>
>>> input_data='WIP: foöbar\n'

fyi, running $OutputEncoding = [Console]::OutputEncoding = (new-object System.Text.UTF8Encoding $false) before echo "WIP: foöbar" | python read_from_stdin.py but without sys.stdin.reconfigure(encoding="utf8")

if __name__ == "__main__":
    import sys
    input_data = sys.stdin.read()
    print(f"{sys.stdin=}")
    print(f"{input_data=}")

>>> sys.stdin=<_io.TextIOWrapper name='<stdin>' mode='r' encoding='cp1252'>
>>> input_data='WIP: foöbar\n'

seems like there is some value in forcing stdin encoding to utf8, but this is ultimately a powershell problem like you expected. So, I think you can close the issue.

jorisroovers commented 1 year ago

Thanks for doing this extra legwork! I'll keep this open for next time I get around to digging into Unicode issues on windows :-)