handle NUL gracefully - Githubissues

mycoboco / beluga

a standard C compiler (with an integrated preprocessor)

http://code.woong.org/beluga

Other

65 stars 8 forks source link

handle NUL gracefully #112

Closed mycoboco closed 6 years ago

mycoboco commented 6 years ago

inside comments (should be ignored)
in string literals and constants (should be diagnosed and preserved)
in other contexts (should be diagnosed and replaced by a space?)

mycoboco commented 6 years ago

Close to impossible because fgets does not return the number of characters read.

One work-around is to use ftell, but it is not highly portable and does not work when the input comes from fifo.

mycoboco commented 6 years ago

Even if it is hard to replace NUL with, say, a space properly, its detection is possible with fgets(). If

feof() is false,
the given buffer is not full and
the last character detected by strlen() is not a newline,

then the input line should have a NUL embedded.

mycoboco commented 6 years ago

The logic to handle input lines should be:

contains newline
- contains trigraph
- line continuation
- has no trigraph
- line completed
has no newline
- feof
- line completed
- !feof
- buffer is full
  - buffer needs to be expanded
- buffer is not full
  - has embedded NUL

mycoboco commented 6 years ago

Or consider to replace fgets() with a modified version.

mycoboco commented 6 years ago

time for i in {1..100}; do ../build/beluga -I ~/public_node/www/var/root/usr/include/ -I ../deps/ocelot-nightly/build/include/ -I ./ -I ../lib/ -I ../cpp. -WvE simp.c > /dev/null; done

Cases	Elapsed time (sec)	Etc
gcc	10.572
`beluga` compiled under the same condition as for gcc	12.124	15% slower
using `ngets()`, a modified version of `fgets()`	13.223	9% slower
adding a check for `NUL`	13.286
removing a call to `strlen()`	13.223	9% slower than the `fgets()` case

Removing a call to strlen() cancels the cost from the check for NUL.

Testing with input that has fewer macro invocations and more source lines shows that beluga runs up to 44% slower than gcc.

mycoboco commented 6 years ago

Considering that source code with embedded NULs should have portability issue anyway and for simplicity, decided to diagnose all occurrences of such characters regardless of the context.

mycoboco commented 6 years ago

1145d994facbc22bf0dccd4e7492cd91775e017b changed to use getc instead of fgetc.