Closed mirtyl-wacdec closed 1 year ago
I can't really reproduce. Both of the ways of encoding those codepoints (escaped and not) work just fine for me:
iex(1)> "\"\\uFE0F\""
"\"\\uFE0F\""
iex(2)> Jason.decode!("\"\\uFE0F\"")
"️"
iex(3)> "\"\uFE0F\"" <> <<0>>
<<34, 239, 184, 143, 34, 0>>
iex(4)> Jason.decode("\"\uFE0F\"")
{:ok, "️"}
Can you provide concrete reproduction steps?
Thanks for the prompt response during holidays.
My concrete steps were fetching json data from an API and then attempting to parse it.
This is odd; I could replicate the error.
Writing the string to a file and then running Jason.decode
after reading that file produced the same error.
But manually saving the saved file on my editor (CTRL+S on VScode) and then running Jason.decode
after reading that, manually saved file, fixed the problem. I guess VSCode runs some encoding editing on save.
I can't legally paste the content here and it probably wouldn't help if the act of moving the text around fixes the encoding issue. Not sure how to debug this without giving you direct access to the API.
It's likely that the data you're receiving is not encoded in UTF-8 - Jason only processes JSON data encoded in UTF-8 as defined by latest standards. You could check with String.chunk(data, :valid)
- if it returns more than one element, some of the data is not valid UTF-8.
It should be possible to either attach the file here, or you can always send me the file over email to michal at muskala dot eu.
Given no way of reproducing the issue, I'm going to close this.
I was parsing a string containing unicode codepoints U+FE0F (Variation Selector-16) U+200D (Zero Width Joiner)
and
Jason.decode
errored out.Seems it's more of an Elixir issue with those complex codepoints, as
String.at(string, <position given by Jason>)
returns naked binaries e.g.<<179>>
.Not sure what to do, please help. String.replace(string, ~r/\x{FE0F}/u, "") takes forever, basically hangs there.