prompt-toolkit / pypager

A $PAGER in pure Python, similar to "less".
BSD 3-Clause "New" or "Revised" License
85 stars 19 forks source link

Pypager Not Showing UTF-8 characters. #10

Closed Werner1201 closed 4 years ago

Werner1201 commented 4 years ago

I've been working on a little project that I can parse the name, chapter or chapters and verses or no verses of the bible, and I am using clint to color my text, but When I pipe it to pypager the "çèéêôâãõ" characters of my source are not even being shown. image On the Image where it shows Gnesis it should be Gênesis. If you want to try yourself to see if I'm being just dumb here's the repo where this project is: https://github.com/Werner1201/PyBible I think I don't need to say to create a virtual env and download the requirements, since you're much more experienced than me. If you help me with that I'd be grateful.

jonathanslenders commented 4 years ago

Can you try this:

echo  "çèéêôâãõ"  | pypager

This should work normally. (It does on my system.) If that doesn't work, we know the issue is not in your code.

Instead of using pypager, can you pipe the output to a file, and then verify the encoding of that file? Or see whether it works with "less" or "more"?

Werner1201 commented 4 years ago

I'll do that test, but the part of working with less isn't possible since I'm on Windows. I will try piping to more but I'd really like to have the functionality of less. When I get home I'll post the results

Werner1201 commented 4 years ago

So when I try to pipe the output of echo to a file like this: image I get this result: image When I pipe it to more on vscode terminal: image When I pipe it on Cmder: image When I pipe it on Native Command Prompt: image When I use the first test: echo "çèéêôâãõ" | pypager on a executable file: image But... There's a catch when I quit I saw this: image When I try the code directly on the terminal: image When I exit: image

I have no Idea where the problem might come from. I don't know if it is some weird windows command line bug or my system is not configured correctly and how I should configure it. I would be very grateful if you help me with this, since I'm a beginner on python coding and the peculiarities of dealing with the command line on python.

jonathanslenders commented 4 years ago

Can you try pypager from this branch: https://github.com/prompt-toolkit/pypager/pull/11 ? I noticed that we don't take sys.stdin.encoding into account. On Windows, this is often not utf-8, but we were always reading from stdin using utf-8.

Werner1201 commented 4 years ago

How can I try this ? should I just make pip update this module ? or should I download the repo and exchange the pypager folder from my venv?

jonathanslenders commented 4 years ago

I have merged it in the "master" branch. You can clone the repository:

git clone https://github.com/prompt-toolkit/pypager.git

Then go into that directory, and run: pip install -e .

Werner1201 commented 4 years ago

Result of running out of the environment the code: image image

Werner1201 commented 4 years ago

When I run it from a .cmd file: image

Werner1201 commented 4 years ago

is this the expected result ?

jonathanslenders commented 4 years ago

I can reproduce it. It's not expected.

Werner1201 commented 4 years ago

image Look like it makes my program wok, I ran the script like this: image

jonathanslenders commented 4 years ago

I get the same weird characters when running this on Windows:

echo "çáóéú" | python -c "import sys;print(sys.stdin.read())"

image

I'm not yet sure what's the precise reason. sys.stdin.encoding is cp1252.

Werner1201 commented 4 years ago

Always encoding, the english speaking people have it so much easier. just kidding, but encoding is always a problem when dealing with reading and piping stuff on cmd, specially if you use this "special characters"

Werner1201 commented 4 years ago

When I was on College, my professors always avoided righting those characters on C.

Werner1201 commented 4 years ago

I could close the issue here, since my program is displaying the characters correcly, but I don't mind helping investigating since I plan to run this program on another computer.

Werner1201 commented 4 years ago

Try changing the language of your system to portuguese Brazil, maybe it changes when changing the language.

Werner1201 commented 4 years ago

althought, it would be too much work to see if this little thing starts to show the same thing.

jonathanslenders commented 4 years ago

If I get it right, for some reason Python thinks the input encoding is cp1252 when data is piped into stdin, while actually it is cp437. Not utf-8.

image

This could be a Python bug, but I'm not 100% sure.

Werner1201 commented 4 years ago

If it is, might be cool to report that, since they might fix it on a newer version.

Em qua, 15 de abr de 2020 18:25, Jonathan Slenders notifications@github.com escreveu:

If I get it right, for some reason Python thinks the input encoding is cp1252 when data is piped into stdin, while actually it is cp437. Not utf-8.

[image: image] https://user-images.githubusercontent.com/216638/79390610-4f329080-7f70-11ea-8149-98fcbf1d66de.png

This could be a Python bug, but I'm not 100% sure.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/prompt-toolkit/pypager/issues/10#issuecomment-614288181, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACB3K2NW522MNSTIQ6SBSOTRMYQ6BANCNFSM4MF3FIBQ .

jonathanslenders commented 4 years ago

Running this in the console, before doing the pipe or using pypager should fix it:

chcp 1252
Werner1201 commented 4 years ago

Alright, I think the issue can be closed. If I was on my computer I would close it.

Em qua, 15 de abr de 2020 18:32, Jonathan Slenders notifications@github.com escreveu:

Running this in the console, before doing the pipe or using pypager should fix it:

chcp 1252

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/prompt-toolkit/pypager/issues/10#issuecomment-614291028, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACB3K2NMIIJVI4NHFTG2Z2TRMYRYHANCNFSM4MF3FIBQ .

jonathanslenders commented 4 years ago

So, I think Python changes the encoding to cp1252 during startup, while the data that's passed over the pipe still uses cp437. Possibly related: https://www.python.org/dev/peps/pep-0528/

I don't know of syscalls to detect the encoding of a file handle. I'll close the issue for now, as we have a workaround.