tenox7 / ttyplot

a realtime plotting utility for terminal/console with data input from stdin
Apache License 2.0
957 stars 42 forks source link

[1.5.2] `ttyplot.c:129:9: runtime error: signed integer overflow: 0 - -2147483648 cannot be represented in type 'int'` #111

Closed hartwork closed 6 months ago

hartwork commented 7 months ago

To reproduce:

# git rev-parse HEAD
8d241301f53cab6ab23ba80072b9672423132d93

# make clean all {C,LD}FLAGS='-fsanitize=undefined -fno-sanitize-recover=all' \
    && ./ttyplot <<<NaN |& tee /tmp/foo \
    && reset \
    && grep -oE ttyplot\.c.+ /tmp/foo
ttyplot.c:129:9: runtime error: signed integer overflow: 0 - -2147483648 cannot be represented in type 'int'

This is with GCC 12. Clang 15 would say…

ttyplot.c:151:58: runtime error: nan is outside the range of representable values of type 'int'

…instead for the very same case.

PS: This was uncovered feeding ttyplot with /dev/random, briefly mentioned at https://github.com/tenox7/ttyplot/issues/78#issuecomment-1807175970 before but lacked a dedicated GitHub issue and a reproducer until now.

hartwork commented 7 months ago

I'm thinking the fix here could be threefold:

The key idea here is to both protect against bad data entering the application and protect the draw layer from unpaintable results existing ttyplot's "math engine" in addition, to be waterproof.

@edgar-bonet @tenox7 what do you think?

edgar-bonet commented 7 months ago

@hartwork: This approach looks very defensive to me. I am fine defensive programming, but all this may not be strictly needed:

  1. NANs are the real problem caught by the sanitizer, so yes, something has to be done about them. Either replace with zeroes, or skip these datapoints (if (isfinite(v1[i]) && isfinite(v2[i])) draw_line(...);. The result should look the same.

  2. If finite, the numbers being cast to int are between 0.0 and ph (the plot height), so they certainly lie between INT_MIN and INT_MAX.

  3. The way l2 and l2 are computed, they also fit within the terminal height.

hartwork commented 7 months ago
  1. If finite, the numbers being cast to int are between 0.0 and ph (the plot height), so they certainly lie between INT_MIN and INT_MAX.

@edgar-bonet is that guaranteed? E.g. if min and max ever are equal then max-=min; will make max 0 and then we divide by zero which should give infinity, multiplied by ph is infinity again. I think we protect against them being equal — not sure if we cover all places — but min and max being very close — say 3.000000000001 and 3.000000000002 — can still shoot over the int range when dividing by max, no? That division is what worries me mostly.

edgar-bonet commented 7 months ago

@hartwork wrote:

if min and max ever are equal [...] then we divide by zero

Indeed, we should check whether this can ever happen, or whether this is already well protected against. It is not very clear to me, as there are some manipulations tweaking max that depend on softmax and harmin...

but min and max being very close — say 3.000000000001 and 3.000000000002 — can still shoot over the int range when dividing by max, no?

Nope. If min and max are close enough, IEEE-754 guarantees that computing their difference will give an exact result. This, by itself, is remarkable: not many floating point computations guarantee exact results. Then, if values1[i] lies between min and max, the quantity (values1[i]-min)/(max-min) will be between zero and one.

hartwork commented 7 months ago

@edgar-bonet I've been experimenting with variants of code…

#include <assert.h>
#include <math.h>
#include <stdio.h>

int main() {
  double max = 3.00000000000000000000000001;
  double min = 3.00000000000000000000000002;
  assert(max != min);
  double numerator = 123.0;
  double denominator = min - max;
  assert(isfinite(denominator) && (denominator != 0.0));
  double res = numerator / denominator;
  printf("%f\n", res);
  return 0;
}

…and it seems that once min and max have different values at runtime this no longer gets to inf in practice. So my experiments seem to support your statement and that would mean that we only need part "one" out of my original threefold proposal. Cool!