hankhero / cl-json

Json encoder and decoder for Common-Lisp
Other
75 stars 39 forks source link

Surrogate pair #11

Open rudolph-miller opened 8 years ago

rudolph-miller commented 8 years ago
(yason:parse "\"\\uD840\\uDC0B\"")
;; => "𠀋"

(with-input-from-string (stream "\"\\uD840\\uDC0B\"")
  (cl-json:decode-json stream))
;; => "��"
leosongwei commented 7 years ago

This "bug" is at decoder.lisp(around line 160):

                 ((len rdx)
                  (let ((code
                         (let ((repr (make-string len)))
                           (dotimes (i len)
                             (setf (aref repr i) (read-char stream)))
                           (handler-case (parse-integer repr :radix rdx)
                             (parse-error ()
                               (json-syntax-error stream esc-error-fmt
                                                  (format nil "\\~C" c)
                                                  repr))))))
                    (restart-case
                        (or (and (< code char-code-limit) (code-char code))
                            (error 'no-char-for-code :code code))

Escape sequence "\u" is just split and encoded in separate characters and then returned. Surrogate pair is just not implemented in CL-JSON.

I've got a dirty hack:

(progn
  (setf xxx (with-input-from-string (stream "\"\\uD83D\\uDE03\"")
              (cl-json:decode-json stream)))

  (princ (code-char
          (let ((c1 (char-code (aref xxx 0)))
                (c2 (char-code (aref xxx 1))))
            (+ #x10000
               (ash (logand #x03FF c1) 10)
               (logand #x03FF c2))))))

=>
😃
#\SMILING_FACE_WITH_OPEN_MOUTH
eadmund commented 6 years ago

This also causes a rather nasty failure to handle output:

(json:decode-json-from-string "\"\\uD83D\\uDE02\\uD83D\\uDE02\"")

I suggest using YASON instead.