Bytes implicitly upgraded into wide characters as iso-8859-1

rtucker commented 10 years ago

Greetings!

Using the tea.txt example, I'm occasionally getting this error:

Bytes implicitly upgraded into wide characters as iso-8859-1 at /home/rtucker/bin/speedread line 172

I'm using xfce4-terminal on Linux Mint 16, and en_US.UTF-8 is my language. This is with revision fc18f1b5339f6484809805a36ff4186ddb7612d0.

Thanks!

(master) rtucker@racer-x:~/dev/speedread$ speedread tea.txt 
                    v
                  golden:                                       332 wpmBytes implicitly upgraded into wide characters as iso-8859-1 at /home/rtucker/bin/speedread line 172
                   has                                          332 wpmBytes implicitly upgraded into wide characters as iso-8859-1 at /home/rtucker/bin/speedread line 172
                   and                                          332 wpmBytes implicitly upgraded into wide characters as iso-8859-1 at /home/rtucker/bin/speedread line 172
                   tea.                                         332 wpmBytes implicitly upgraded into wide characters as iso-8859-1 at /home/rtucker/bin/speedread line 172
                   small                                        332 wpm
 49.71s, 230 words, 1006 letters, 277.61 true wpm

pasky commented 10 years ago

Hi! With the latest commit, UTF-8 handling got tweaked slightly, can you confirm that you can still reproduce this? It works fine for me... There is a non-breaking space U+00A0 at these points but it should work fine. If this is still reproducible, can you please paste full output of the locale command?

and-reas-se commented 10 years ago

I can reproduce this. Note that it only happens when you specify the file as a command-line argument, not when reading the text from stdin.

Here's my locale info: LANG=en_US.UTF-8 LANGUAGE=en_US:en LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=

pasky commented 10 years ago

I believe this should be fixed by PR #9.

pasky / speedread

Bytes implicitly upgraded into wide characters as iso-8859-1 #2