Closed ktakashi closed 9 years ago
Thank you for your report.
The specification of error-handling-mode
in R6RS library section 8.2.4 says
If a textual input operation encounters an invalid or incomplete character encoding, and the error-handling mode is ignore, an appropriate number of bytes of the invalid encoding are ignored and decoding continues with the following bytes. If the error-handling mode is replace, the replacement character U+FFFD is injected into the data stream, an appropriate number of bytes are ignored, and decoding continues with the following bytes.
What does "an appropriate number of bytes" mean here? For UTF-8, there are at least three plausible interpretations of that phrase:
All of those interpretations seem to be allowed by the R6RS standard. WIth the first interpretation, the test program writes #\g
twice. With the second interpretation, the test program writes an end-of-file object twice. With the third interpretation, the test program writes #\newline
twice.
So this is not a bug. It's just an example of a program whose behavior is not fully specified by the R6RS standard.
I think the following script should print
#\g
twice but#\newline
twice.Version: Larceny v0.98 "General Ripper" (Mar 7 2015 01:06:26, precise:Linux:unified)