ohler55 / ojg

Optimized JSON for Go
MIT License
839 stars 50 forks source link

Unicode error token reporting #74

Closed quackenbush closed 2 years ago

quackenbush commented 2 years ago

When encountering errors at multi-byte/unicode tokens, the ojg parser reports an error message with the wrong character.

$ echo '→' | oj
*-*-* unexpected character 'â' at 1:1

I'd like to either display the correct unicode character, or omit it:

*-*-* unexpected character '→' at 1:1

Note: this same bug shows up in the "encoding/json" library and in Firefox JS, but is correct in Chrome JS.

Here is Chrome's error message:

> JSON.parse('→')
VM83:1 Uncaught SyntaxError: Unexpected token → in JSON at position 0
    at JSON.parse (<anonymous>)
    at <anonymous>:1:6
ohler55 commented 2 years ago

I'll take a look. If the unicode can be preserved I'll do that but failing that maybe leave it off.

ohler55 commented 2 years ago

Fixed in branch bug/unicode-in-error. Please give it a try and then I'll release. It was easier to fix than expected.

quackenbush commented 2 years ago

Thanks Peter for the quick fix.

Confirmed:

*-*-* unexpected character '→' at 1:1
quackenbush commented 2 years ago

Looks like the "escape" code path needs the same fix:

$ echo '"\→"' | ./main
*-*-* invalid JSON escape character '\â' at 1:3
ohler55 commented 2 years ago

Considered that but chose wrong. On it.

ohler55 commented 2 years ago

Ok, pushed that change too.

quackenbush commented 2 years ago

Yep, looks good.

ohler55 commented 2 years ago

Released