michalmuskala / jason

A blazing fast JSON parser and generator in pure Elixir.
Other
1.58k stars 168 forks source link

[QUESTION] Jason with UTF-8 characters #141

Closed at88mph closed 2 years ago

at88mph commented 2 years ago

I have a Phoenix 1.6 app that is proxying JSON requests from a different domain. Some of the JSON coming in contains extended ASCII (French) characters:

{"name": "abcdé"}

And my controller is simply pulling it in and sending it back out again:

    def get(conn, _params) do
      url = "https://..."
      response = HTTPoison.get!(url)
      conn |> halt |> json response.body
    end

My Phoenix app is using Jason by default, but the JSON cannot be encoded on the way out:

(Jason.EncodeError) invalid byte 0xE9 in \"{\"name\": \"abcdé\"}\"

Can Jason be told to re-encode extended characters?

michalmuskala commented 2 years ago

UTF-8 is handled correctly by Jason, as JSON is defined by standards to basically work only on UTF-8 text.

Your description, however, makes it sound like the returned encoding is latin1 (judging from the 0xE9 byte, UTF-8 does not contain standalone bytes like that), which is not handled by this library (and I'd assume many other libraries).

In particular, this works just fine, when input is indeed UTF-8:

iex(1)> Jason.decode!(~S|{"name": "abcdé"}|)
%{"name" => "abcdé"}

You can always try transcoding the encoding of the string with :unicode.characters_to_binary(data, :latin1, :utf8), if you know the data is indeed in latin1.

at88mph commented 2 years ago

That was exactly what I needed. Many thanks.

On Sep 30, 2021, at 8:02 AM, Michał Muskała @.***> wrote:

UTF-8 is handled correctly, as JSON is defined by standards to basically work only on UTF-8 text.

Your description, however, makes it sound like the returned encoding is latin1 (judging from the 0xE9 byte, UTF-8 does not contain standalone bytes like that), which is not handled by this library (and I'd assume many other libraries).

In particular, this works just fine, when input is indeed UTF-8:

iex(1)> Jason.decode!(~S|{"name": "abcdé"}|) %{"name" => "abcdé"} You can always try transcoding the encoding of the string with :unicode.characters_to_binary(data, latin1, utf8), if you know the data is indeed in latin1.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/michalmuskala/jason/issues/141#issuecomment-931404715, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMYFZFOXJB4JHSDCBDM5YLUER32FANCNFSM5FADNCWA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.