jgoerzen / twidge

Command-line twitter/identica client [Haskell]
http://wiki.github.com/jgoerzen/twidge/
GNU General Public License v2.0
220 stars 29 forks source link

Broken encoding of Unicode updates from the command line #12

Closed astanin closed 8 years ago

astanin commented 14 years ago

1.0.2 prints Unicode tweets correctly, but corrupts them when sending an update from the command line

$ twidge update 'unicode message'

An example: http://twitter.com/jetxee/status/15322966107 Instead of: "... проверка twidge 1.0.2"

Sending an update from the stdin works correctly:

$ twidge update
unicode message^D
astanin commented 14 years ago

I don't know how to find the current locale in Haskell and decode the output of getArgs. Probably it is GHC which should handle this.

Probably related GHC bugs:

http://hackage.haskell.org/trac/ghc/ticket/3307 http://hackage.haskell.org/trac/ghc/ticket/3309

astanin commented 14 years ago

I use this patch as a private workaround: http://gist.github.com/423901

jgoerzen commented 14 years ago

Is there a way to make that work with utf8-string 0.3.4? It's currently being shipped in Debian, for instance, and I'd like to be able to be compatible with it if I can.

astanin commented 14 years ago

0.3.4 doesn't have isUTF8Encoded. We can either don't check for UTF8 encoding at all and decode anyway (likely to break for those with other locales), or copy-paste isUTF8Encoded from 0.3.5 under a different name into twidge.

isUTF8Encoded: http://hackage.haskell.org/packages/archive/utf8-string/0.3.6/doc/html/src/Codec-Binary-UTF8-String.html#isUTF8Encoded

murrayf commented 13 years ago

... spanish "tildes" (á é í ó ú) are not correctly shown in twitter.com from update twidge command.

murrayf commented 13 years ago

... spanish eñe letter (ñ) are not supported also, the ISO code for all this symbols is ISO-8859-15. Hope it helps!

jgoerzen commented 13 years ago

What version of twidge are you using, murrayf? Are you piping the data to twidge on stdin or giving it on the command line?

murrayf commented 13 years ago

Hello again John, I'm using version 1.0.2. from ubuntu maverick deb package. This errors come from updating command.

jgoerzen commented 13 years ago

One potential problem is that your system locale is something other than UTF-8. twitter and twidge both are designed to operate with UTF-8 only.

Can you check on that?

murrayf commented 13 years ago

... this is the result of locale command for my system: LANG=es_ES.utf8 LC_CTYPE="es_ES.utf8" LC_NUMERIC="es_ES.utf8" LC_TIME="es_ES.utf8" LC_COLLATE="es_ES.utf8" LC_MONETARY="es_ES.utf8" LC_MESSAGES="es_ES.utf8" LC_PAPER="es_ES.utf8" LC_NAME="es_ES.utf8" LC_ADDRESS="es_ES.utf8" LC_TELEPHONE="es_ES.utf8" LC_MEASUREMENT="es_ES.utf8" LC_IDENTIFICATION="es_ES.utf8" LC_ALL=

jgoerzen commented 13 years ago

OK. And are you providing the update as a command-line parameter or on stdin?

murrayf commented 13 years ago

command-line parameter. I'm not sure but as you can see in message above LC_ALL= is empty by default, I don't know if that has to be that way.

AlekseiPrishchepo commented 13 years ago

It doesn't post updates with utf-8 symbols for me too.

tatxo commented 11 years ago

A workaround that works for me is to echo something pipelined to twidge (my locale is UTF-8) $ echo "Algo en español" | twidge update

muzzol commented 8 years ago

I've hit this bug with version 1.1.2

pipeing workaround works but I've found another problem, if the string contains newlines it gets trimmed.

for example:

TEXT="one
two
three"

echo "$TEXT" | twidge update

just post "one"

jgoerzen commented 8 years ago

Did 09c59c20a35ec5f8ccd71596aad68f52fbd82559 not fix this?

jgoerzen commented 8 years ago

In fact, I believe 09c59c2 should have fixed this.