trailofbits / polytracker

An LLVM-based instrumentation tool for universal taint tracking, dataflow analysis, and tracing.
Apache License 2.0
516 stars 47 forks source link

Erroneous 'RangeNode' is emitted on enumeration #6524

Closed hbrodin closed 1 year ago

hbrodin commented 1 year ago

Compiling an instrumented version of the below program

#include <limits>
#include <iostream>
#include <unistd.h>

int main() {

  int base = std::numeric_limits<int>::max() - 100;
  char c1 = 0, c2 = 0;

  read(0, &c1, sizeof(c1));
  if ((c1 & 1) == 0)
  {
    printf("c1 is even %x\n", c1);
  }
  else
  {
    printf("c1 is odd %x\n", c1);
    read(0, &c2, sizeof(c2));
    if (c2 + base < base) {
      printf( "c2 was to big for base! got %x", (c2 + base));
    } else {
      printf("Perfect! C: %x", c2);
    }

  }
  return 0;
}

Yields the following node sequence, when iterating the resulting TDAG:

TDRangeNode: affects control flow False [0, 0]
TDSourceNode: affects control flow True idx 1 offset 0
TDSourceNode: affects control flow True idx 1 offset 1

using

tr = PolyTrackerTrace.load("tdagfile")
for n in tr.tdfile.nodes:
    print(f"{n}")

I think the reason is that label zero is interpreted as a RangeNode when it should be omitted.

kaoudis commented 1 year ago

Is label 0 a result of the assignment

  char c1 = 0, c2 = 0;

?

(Asking because I am not 100% sure, not because I know!!) would or could that be two labels? Sort of surprised that - if I am correct about what they refer to - they could count as a range node. Maybe because zeroed out together?

hbrodin commented 1 year ago

Label zero represents untainted data. The first real taint label is one. In decoding of the taint node section, there is a check if label 1 > label 2, if so it is a UnionTaint else it is a RangeTaint. Except, there is one special case label1 == label2 == 0. I'm working on a fix for this right now.