Closed PertinentDetail closed 2 years ago
Thanks for digging into this. It seems to me that the whole code could probably be simpler overall since it doesn't need to worry about partial digits like the <hex>
syntax does.
for( n=0; !eof && n < S.comp_.sz; n++ ){
do (c[0] = xpost_file_getc(f)) == EOF && ++eof; while( ! (eof || strchr(hex, c[0])) );
if( !eof ){
do (c[1] = xpost_file_getc(f)) == EOF && ++eof; while( ! (eof || strchr(hex, c[1])) );
if( !eof ) s[n] = ((strchr(hex, toupper(c[0])) - hex) << 4)
| (strchr(hex, toupper(c[1])) - hex);
}
}
S.comp_.sz = n;
This version seems more correct given your analysis but I'll put it through some tests before committing. And probably the style I have here is too compact per our style rules. The do...while loops could probably be factored into a helper function with a descriptive name.
Oops. This still has the n++
problem you already diagnosed.
Take 2.
int readhexdigit( Xpost_File *f, int *p ){
int eof = 0;
do (*p = xpost_file_getc(f)) == EOF && ++eof;
while( ! (eof || strchr(hex, *p)) );
return eof;
}
for( n = 0; !eof && n < S.comp_.sz; n++ ){
eof += readhexdigit(f, &c[0]);
if( !eof ) eof += readhexdigit(f, &c[1]);
if( eof ) break;
s[n] = ((strchr(hex, toupper(c[0])) - hex) << 4) | (strchr(hex, toupper(c[1])) - hex);
}
S.comp_.sz = n;
Written, tested, committed, pushed.
Unfortunately,
readhexstring
just doesn't work. It will fail any test you throw at it.I have a lot of sympathy for this bug. The main problem is one of those blatant issues that's ridiculously hard to spot until you end up tracing the code line by line and see it go the wrong way at a branch. Even knowing
readhexstring
wasn't working, I stared at this code for over an hour and I couldn't see the problem.readhexstring
is supposed to skip non-hex characters and process only hex characters. The loops that do this look like:The variable
hex
correctly contains all upper and lower case hex digits.Ignoring the end of file handling (which I'll get to in a sec), this simplifies to:
strchr
returns non-NULL if its second argument (a character) is contained in its first argument (a string). So this loop will keep looping as long as the character being read is a hex digit and will exit only when it reads a non-hex digit. This is exactly the reverse of what the test should be.Flipping the
!=
to==
fixes the core problem, but there are others: the end of file handling has a couple of bugs too.The first is that if we hit end of file between characters (that is, between pairs of hex digits) we'll always return one character too many.
There are no early exits (
break
,goto
orreturn
) in the mainfor
loop. So, if we're not already at end of file at the start of the loop then we're guaranteed to hit the line that writes tos[n]
and the loop increment (n++
). This will happen even if we hit end of file before reading a hex digit intoc[0]
.It's easiest to see this by considering the case where readhexstring is called when we're already at the end of the file (although we don't know that yet as we haven't read the EOF character). Before the
for
loop,eof
is initialised to 0 which guarantees thefor
loop will be run at least once and thus we'll return a single character string. Here's a simplification of the code for that case:The second end of file issue is that the logic for handling end of file between the first and second hex digits is wrong. It looks like the code is trying to pad the string with a '0' if there are an odd number of hex digits. Although this is the right behaviour for hex string literals (such as
<901fa>
) it doesn't appear to be correct forreadhexstring
. Even if this were the right behaviour, the logic is wrong. The code sets the second digit (c[1]
) to '0' if end of file was encountered when reading the first digit (c[0]
), when it should be doing it if the first digit was read normally (no end of file) but end of file was encountered when reading the second digit.A simple test is:
This should output "Testing\n"
Here are Try it online! links for that test, for xpost (which fails) and Ghostscript (which passes). Both these links run the output through
od -c
so that non-printable characters are clearly visible.I've attached a more complex test with more non-hex characters. I've had to give it a .txt extension for upload to GitHub: readhexstring.txt. You'll need to add another test for input with an odd number of hex digits.