Closed rob-myers closed 5 years ago
Thanks for the report - will investigate. If these are indeed bugs, that probably calls for a bugfix release sometime this week or next.
The problematic piece of code is this:
if p.bsp+utf8.UTFMax >= len(p.bs) {
// we might need up to 4 bytes to read a full
// non-ascii rune
p.fill()
}
This was fine while the parser wasn't interactive. We didn't care about forcing an extra read too early.
However, with the interactive parser, this isn't fine. The lexer thinks that the 好\n
bytes may not be enough to read another unicode character, so it forces another read.
That logic should be refactored. We should only force reading extra bytes if we saw a non-ASCII byte that wasn't valid utf-8 because it was missing bytes. That should fix your case, and any other unicode characters next to the end of a read chunk of bytes.
I'm pretty sure that the commit above will fix your unicode issues. I've only verified it with gosh
and integration tests, not the JS package itself, but I'm fairly certain this was the cause.
I'll publish a newer version of the JS package in a bit to include the fix. Do you have any other bugs or issues that I could include in a 2.6.2 release sometime this weekend or next week?
I'm pretty sure that the commit above will fix your unicode issues.
Yep it has, thanks.
Do you have any other bugs or issues that I could include in a 2.6.2 release sometime this weekend or next week?
No, nothing yet.
My in-browser interpreter is now working pretty well. Once the 'animator' is working I'll publish and let you know.
Your interactive parser fails in the following situation:
Similarly,
gosh
prompts for more input givenecho 你好
.echo 你好
(trailing space) andecho "你好"
work as expected.