vhelin / wla-dx

WLA DX - Yet Another GB-Z80/Z80/Z80N/6502/65C02/65CE02/65816/68000/6800/6801/6809/8008/8080/HUC6280/SPC-700/SuperFX Multi Platform Cross Assembler Package
Other
547 stars 98 forks source link

Usage of sprintf across codebase #273

Closed AntonioND closed 4 years ago

AntonioND commented 4 years ago

Today I was building WLA-DX in my machine and got the following warning:

wla-dx/wlalink/main.c: In function ‘parse_and_set_libdir’:
wla-dx/wlalink/main.c:861:27: warning: ‘sprintf’ may write a terminating nul past the end of the destination [-Wformat-overflow=]
  861 |   sprintf(ext_libdir, "%s/", n);
      |                           ^
wla-dx/wlalink/main.c:861:3: note: ‘sprintf’ output between 2 and 257 bytes into a destination of size 256
  861 |   sprintf(ext_libdir, "%s/", n);
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The problem is that the compiler realizes that the max possible output of sprintf() is too big for the buffer.

In this case, it is easy to fix the problem by increasing the size of ext_libdir, for example, and I'm more than happy to submit a PR. However, I checked with grep and it looks like all the codebase is using sprintf() rather than snprintf(). I guess it's not a huge deal because this tool isn't really security-critical, and the compiler doesn't complain about other instances of sprintf() so in most cases it seems to be fine (but the checks don't work when %s refers to a char*, it only works with arrays, so there are surely more problems like this.

I think that in general all the calls should be checked, and anything that comes as input from the user should use snprintf() to make sure that there are no overflows. What do you think?

In case you're wondering, I'm using gcc (Debian 9.2.1-21) 9.2.1 20191130.

Neui commented 4 years ago

This is known, but there is no snprintf in C89, that was introduced in C99, and the author wants to stay on C89 for compatibility. Increasing buffer sizes would help, but not solve the problem.

I have thought of maybe using some wrapper function and use snprintf when available (detect by platform/build system), but I haven't thought further.

vhelin commented 4 years ago

We really need to continue using C89 as I want to keep e.g. Amiga support. :) But perhaps someone could write a similar function to snprintf()? I'm sure the use cases of sprintf() in WLA DX are all very similar and quite simple so writing a function that does the same shouldn't be a huge issue in our case...

cr1901 commented 4 years ago

I have thought of maybe using some wrapper function and use snprintf when available (detect by platform/build system), but I haven't thought further.

This would also be my suggestion in the future, though for now I would recommend the following:

SolraBizna commented 4 years ago

Anybody wants to code (a limited version, we don't need all its features) snprintf() for this C89 project? :) I'm busy implementing new features and fixing bugs...

I gave it a quick look, and WLA mainly uses %c, %s, and %d, with rare use of format specifiers. So here is a version of snprintf that supports only those.

#include <stdarg.h>
#include <string.h>
#include <stdio.h>

int dumb_snprintf(char* outbuf, size_t rem, const char* format, ...) {
  char* outp = outbuf;
  char dbuf[12];
  va_list arg;
  const char* other_str;
  size_t len;
  int an_int;
  va_start(arg, format);
  while(rem > 1 && *format) {
    if(*format == '%') {
      ++format;
      switch(*format) {
      case 0: break;
      case 'c':
        ++format;
        /* there is room for at least one more character, or the loop would
           have terminated */
        *outp++ = (char)va_arg(arg, int);
        --rem;
        break;
      case 's':
        ++format;
        other_str = va_arg(arg, const char*);
        len = strlen(other_str);
        if(len + 1 < rem) {
          /* there is room to copy the whole string */
          strcpy(outp, other_str);
          outp += len;
          rem -= len;
        }
        else {
          /* not quite room */
          memcpy((void*)outp, (const void*)other_str, rem - 1);
          outp += rem - 1;
          rem = 1;
        }
        break;
      case 'i': /* I personally use this instead of %d */
      case 'd':
        ++format;
        /* avoid a bug in MrC that I may or may not be misremembering by not
           putting a call to va_arg directly inside a parameter */
        an_int = va_arg(arg, int);
        sprintf(dbuf, "%d", an_int);
        len = strlen(dbuf);
        if(len + 1 < rem) {
          /* there is room to copy the whole value */
          strcpy(outp, dbuf);
          outp += len;
          rem -= len;
        }
        else {
          /* not quite room */
          memcpy((void*)outp, (const void*)dbuf, rem - 1);
          outp += rem - 1;
          rem = 1;
        }
        break;
      case '%':
        *outp++ = '%';
        --rem;
        ++format;
        break;
      default:
        fprintf(stderr, "Unknown format char: %c\n", *format);
        ++format;
        break;
      }
    }
    else {
      *outp++ = *format++;
      --rem;
    }
  }
  va_end(arg);
  *outp = 0;
  /* return number of chars written, excluding terminating NUL */
  return outp - outbuf;
}

int main() {
  char buf[79];
  int n;
  for(n = 79; n > 0; --n) {
    dumb_snprintf(buf, n, "100%% of tests must pass. Char of the day: '%c'. %d chars. %s says that it's already too long to fit. This test cuts off during the format string.", (char)(' '+n), n, "Solra Bizna");
    printf("\"%s\"\n", buf);
  }
  return 0;
}

Compiles without warnings with gcc -Wall -std=c89. The test output seems fine. The only portability snafu I'm aware of is that on systems with 64-bit int, dbuf is too small. I've never personally used a system with 64-bit int, but they do exist.

(I've been doing a lot of 6502 assembly in the last few days, so it was really hard to resist the urge to collapse the common tails of %d and %s with a goto...)

Counting only sprintf and not printf, there are six instances of a bare %x, one instance of a bare %f, and 12 instances of format specifiers, at least in the outdated 9.9 codebase I accidentally scanned instead of current master. Adding %x and %f support to the above would be trivial. Supporting format specifiers is harder. Nevertheless, the above implementation is sufficient to replace over 500 instances of sprintf, including the one that spawned this issue (and my PR).

If snprintf truncates at all, that's probably a fatal assembler error.

In the instance that spawned this issue, a truncation would lead to ext_libdir not having a trailing directory separator, which would have some fun consequences later on...

AntonioND commented 4 years ago

I think that relying on a custom snprintf that hasn't been tested much is asking for trouble, to be honest. If it is decided to use a custom version of snprintf, it's probably easier to use an implementation that at least has some level of testing and support, such as this one: https://github.com/mpaland/printf

SolraBizna commented 4 years ago

I think that relying on a custom snprintf that hasn't been tested much is asking for trouble

Especially since I wrote it in fifteen minutes, just after waking up from poor sleep! :D

vhelin commented 4 years ago

Good work guys! I'll take a look at this issue later this weekend. SolraBizna's implementation looks pretty nice as well. We don't need a super battle tested implementation for our use as we don't use many of snprintf()'s features... But more later.

vhelin commented 4 years ago

I think that relying on a custom snprintf that hasn't been tested much is asking for trouble, to be honest. If it is decided to use a custom version of snprintf, it's probably easier to use an implementation that at least has some level of testing and support, such as this one: https://github.com/mpaland/printf

This seems to use MIT License, so we could use it as well without any problems, just embed the C-file into WLA DX project with all the required license texts. All the comments need to be transformed into ANSI C89 format, but that's a small issue. I'll try to do this tomorrow... There might be other issues with using ANSI C89, but I think those can be solved quickly...

vhelin commented 4 years ago

https://github.com/mpaland/printf looks quite compact, but needs a little work to make it ANSI C89 compatible... I think I can integrate that to WLA DX, but next week, today I'm busy with other stuff.

vhelin commented 4 years ago

Does it work for you guys? All WLA DX's tests pass with the new snprintf()...

AntonioND commented 4 years ago

My compiler has stopped complaining about sprintf, so yes, works for me!

SolraBizna commented 4 years ago

Now builds without any warnings for me. I'm in the middle of migrating between development systems so I wouldn't be able to do deeper testing for a few days.