pallet / ritz

SWANK and nREPL servers for clojure providing JPDA based debuggers
318 stars 33 forks source link

ritz character coding error #62

Closed scramjet closed 11 years ago

scramjet commented 11 years ago

[re post from the Clojure google group]

Hello,

my current lein-ritz (ritz 0.5.0, OS X, Java 7) setup, started with slime-connect, hangs when handling Unicode characters, e.g.

user> (def a "\uD83D\uDE1F") ; UTF-16 representation of Unicode 4 WORRIED FACE

'clojure.core/a

user> a "?"

The REPL is hung at this point: in the the mini buffer I see "error in process filter: Wrong type argument: listp, :write-string".

Notably, this doesn't happen for characters that can be represented directly in UCS-16, only those that require two byte encoding for UTF-16 (e.g. all the new Emoji characters in Unicode 4). So I can use greek characters, etc fine. So I suspect it's something ritz is doing, since it's a common error in Java to assume that a single Java character = a single Unicode code point.

I've done "(setq slime-net-coding-system 'utf-8-unix)" in my Emacs config, and used ":jvm-opts ["-Dswank.encoding=utf-8"]" in my Lein project, which changed something (slightly different error message), but still broken.

I've also tried using ":encoding" on the lein command line, and looked into the ritz source, where it seems I'm doing the right thing.

I'm stumped: any ideas?

Cheers,

Matthew.

hugoduncan commented 11 years ago

@scramjet Could you point me at a font that actually displays these characters in emacs?

scramjet commented 11 years ago

Hi Hugo,

I'm not actually sure Emacs can display those characters. Even a box or ? would be fine: it's that these characters seem to corrupt and break the connection between SLIME and Ritz.

juergenhoetzel commented 11 years ago

This is because surrogate pairs consists of two chars and thus results in an invalid length when ".length" is used to calculate the packet length for the rpc header.

Above commit should fix the issue.

BTW there is still an issue with character encodings when using the JDI Debugger. Strings are encoded correctly when using "--no-debug". But there seems an encoding issue caused by the copying in vm-stream-daemons.

hugoduncan commented 11 years ago

@juergenhoetzel Thanks for tracking this down! The character still displays as a "?" here, but that is probably a font selection issue in my emacs.

Running under the debugger seems to work here (OSX). Does using -Dfile.encoding=UTF-8 in your :jvm-opts fix this for you?

juergenhoetzel commented 11 years ago

I use UTF-8 :

juergen@samson:~ → localectl 
   System Locale: LANG=de_DE.UTF-8
   VC Keymap: de-latin1
   X11 Layout: n/a

Java Version:

juergen@samson:~ → java -version
java version "1.7.0_09"
OpenJDK Runtime Environment (IcedTea7 2.3.3) (ArchLinux build 7.u9_2.3.3-1-x86_64)
OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)

VM Options:

-Dswank.encoding=utf-8 -Dfile.encoding=UTF-8 -Xdebug -Xrunjdwp:transport=dt_socket,address=samson:57828,suspend=y

It seems to be an encoding issue of the JPDA TCP Socket Connection.

juergenhoetzel commented 11 years ago

Well I'm not sure if its really a JDPA TCP Connection encoding issue, because the problem, does not occur when using java/jdb on the same platform.

juergenhoetzel commented 11 years ago

Fixed the issue using:

https://github.com/pallet/ritz/pull/67

The problem was that the-Dswank.encoding=utf-8 option was not passed to the controlling VM and thus latin1 encoding was setup for the Reader/Writer of the Swank network connection,

juergenhoetzel commented 11 years ago

I guess this issue can be closed

hugoduncan commented 11 years ago

Thanks @juergenhoetzel for fixing this