syntax: premature reads in the parser when parsing non-ASCII

mvdan / sh

A shell parser, formatter, and interpreter with bash support; includes shfmt

https://pkg.go.dev/mvdan.cc/sh/v3

BSD 3-Clause "New" or "Revised" License

7.25k stars 344 forks source link

syntax: premature reads in the parser when parsing non-ASCII #327

Closed rob-myers closed 5 years ago

rob-myers commented 5 years ago

Your interactive parser fails in the following situation:

$ while true; do
>   echo 你好

RangeError: Maximum call stack size exceeded at Object.$packages.mvdan.cc/sh/syntax.Parser.ptr.regToken.

Similarly,

gosh prompts for more input given echo 你好.
But echo 你好 (trailing space) and echo "你好" work as expected.

mvdan commented 5 years ago

Thanks for the report - will investigate. If these are indeed bugs, that probably calls for a bugfix release sometime this week or next.

mvdan commented 5 years ago

The problematic piece of code is this:

if p.bsp+utf8.UTFMax >= len(p.bs) {
        // we might need up to 4 bytes to read a full
        // non-ascii rune
        p.fill()
}

This was fine while the parser wasn't interactive. We didn't care about forcing an extra read too early.

However, with the interactive parser, this isn't fine. The lexer thinks that the 好\n bytes may not be enough to read another unicode character, so it forces another read.

That logic should be refactored. We should only force reading extra bytes if we saw a non-ASCII byte that wasn't valid utf-8 because it was missing bytes. That should fix your case, and any other unicode characters next to the end of a read chunk of bytes.

mvdan commented 5 years ago

I'm pretty sure that the commit above will fix your unicode issues. I've only verified it with gosh and integration tests, not the JS package itself, but I'm fairly certain this was the cause.

I'll publish a newer version of the JS package in a bit to include the fix. Do you have any other bugs or issues that I could include in a 2.6.2 release sometime this weekend or next week?

rob-myers commented 5 years ago

I'm pretty sure that the commit above will fix your unicode issues.

Yep it has, thanks.

Do you have any other bugs or issues that I could include in a 2.6.2 release sometime this weekend or next week?

No, nothing yet.

My in-browser interpreter is now working pretty well. Once the 'animator' is working I'll publish and let you know.