rlopez1j / xmonad

Automatically exported from code.google.com/p/xmonad
0 stars 0 forks source link

Encoding task force #348

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
xmonad is sloppy about strings and encoding.  We should go through all of
xmonad and make sure all code follows these guidelines:

 * All String ([Char]) values should be lists of Unicode code points, and
should NOT be in UTF-8 or any other encoded form.
 * All input that creates String values should decoded from bytes in
whatever encoding to a lists of code points (this means System.IO and
proper locale settings in GHC 6.12, System.IO.UTF8 in older) 
 * All output that takes String values should encode the code points to
bytes in whatever encoding (again System.IO in 6.12, utf8-string otherwise)
 * All non-string data (lists of bytes, encoded strings, etc.) should be
stored at types other than [Char]

If anybody wants to spearhead this, feel free to come forward. 
Alternatively we could have maintainers self-certify their modules?

Original issue reported on code.google.com by SpencerJ...@gmail.com on 2 Dec 2009 at 5:03

GoogleCodeExporter commented 8 years ago
Modules which import Codec.Binary.UTF8.*:

 * XMonad.Hooks.EwmhDesktops
OK used to encode desktop names.

 * XMonad.Actions.Search
OK used to encode URLs

 * XMonad.Util.XSelection
Seems to be OK. Documentation is outdated and shoud be updated

 * XMonad.Util.Font
Doesn't use function but jus rexport them under different names:
`encodeOutout' and `decodeInput'

encodeOutput is only used in X.Prompt.Shell and X.Hooks.DynamicLog I
think it would be better to remove encodeOutput function and just
import encodeString from utf8-string.

decodeInput is used in X.Prompt.Shell and X.Promt modules.  X.Promt
deserve special investigation since it stopped correcty work with
unicode some time ago.

== Unicode bugs ==

Here is list of found bugs related to unicode:

X.Hooks.DynamicLog: xmonadPropLog'
    Function sets UTF8_STRING atom without properly encoding it.
    Actually it require utf8-encoded string as input.

X.Hooks.DynamicLog: 
    Returns utf8-encoded string. 

X.Hooks.SetWMName: setWMName
    Doesn't properly encode UTF8_STRING atom. Comment from function says:
    "now only accepts latin1 names to eliminate dependency on utf8 encoder"
    Since utf8-string dependency is now madatory this is clearly outdated

X.Prompt 
    Some time ago it suddently begun to mangle non-ASCII input. 

Original comment by alexey.s...@gmail.com on 13 Jun 2010 at 11:52

Attachments:

GoogleCodeExporter commented 8 years ago
Patch above: utf8-encode string as list of bytes not list of chars

Original comment by alexey.s...@gmail.com on 13 Jun 2010 at 11:55

GoogleCodeExporter commented 8 years ago
> encodeOutput is only used in X.Prompt.Shell and X.Hooks.DynamicLog I
think it would be better to remove encodeOutput function and just
import encodeString from utf8-string.
> decodeInput is used in X.Prompt.Shell and X.Promt modules.  X.Promt
deserve special investigation since it stopped correcty work with
unicode some time ago.

I've removed both encodeOutput and decodeInput in darcs by pushing the attached 
patch.

Original comment by gwe...@gmail.com on 14 Jun 2010 at 11:25

Attachments:

GoogleCodeExporter commented 8 years ago
> Patch above: utf8-encode string as list of bytes not list of chars

I've pushed this.

Original comment by gwe...@gmail.com on 14 Jun 2010 at 11:31

GoogleCodeExporter commented 8 years ago
> Seems to be OK. Documentation is outdated and shoud be updated

I'm assuming this is a reference to the docs saying Unicode input is 
unreliable? I've pushed the attached patch.

Original comment by gwe...@gmail.com on 15 Jun 2010 at 12:11

Attachments:

GoogleCodeExporter commented 8 years ago
I should probably mention that according to my copy of the config archive, no 
users were using encodeOutput or decodeInput, so removing them should be 
perfectly safe.

Original comment by gwe...@gmail.com on 15 Jun 2010 at 12:46

GoogleCodeExporter commented 8 years ago
Function
X.H.DynamicLog.dynamicLogString :: PP -> X String
encodes the resulting string to utf8, which is sloppy according to the 
guidelines
and produces double-encoded output with ghc 6.12.3 on darcs version of 
xmonad+contib when outputting to a pipe.

I've removed the encodeString processing, and added a cleaner equivalent to the 
xmonadPropLog, where I believe the output should be utf8-encoded. It fixes 
encoding issues for me, but I have only tested it with pipe output on my 
current setup.

I don't know whether it will work on older versions of ghc, where output might 
not be automatically encoded.

While fixing the log hook, I've noticed a redundant depend in the cabal file 
and (hopefully) made the X.U.NamedWindows.getName function more readable. I 
don't know what to do with those patches, so I'm just attaching them as well.

Original comment by ilab...@gmail.com on 12 Nov 2010 at 3:39

Attachments:

GoogleCodeExporter commented 8 years ago

Original comment by vogt.a...@gmail.com on 12 Nov 2010 at 11:20

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Such problems still exist, in least at two places:
- In XMonad.Core, the spawn function breaks Unicode strings.  For instance, if 
I do “spawn "notify-send \"Réveille-toi !\""” the “é” character in 
the notification becomes “é”.
- In XMonad.Util.Dmenu, the menu arguments are similarly broken if they contain 
such characters.
In the second case, I could solve the problem by removing all 
“encodeString” occurrences in the function runProcessWithInput from 
XMonad.Util.Run (there are other occurrences of “encodeString”in 
XMonad.Util.Run which may cause the same problem).  I suspect (but have not 
tried) that removing “encodeString” in spawnPID (from XMonad.Core) would 
solve the first problem.

Original comment by vej....@gmail.com on 5 Aug 2014 at 8:13

GoogleCodeExporter commented 8 years ago
Actually the problem here is that we fixed encoding issues once... and then ghc 
was changed to deal with encoding issues, and later the core libraries were 
modified to use ghc's encoding support, and we were left with a backward 
compatibility issue. Making sure that everything works properly on everything 
from Debian (typically with an ancient ghc that doesn't fully handle encoding) 
to Arch (typically bleeding edge) without either losing encoding or 
double-encoding is difficult.

Original comment by allber...@gmail.com on 5 Aug 2014 at 1:17