trailofbits / polytracker

An LLVM-based instrumentation tool for universal taint tracking, dataflow analysis, and tracing.
Apache License 2.0
518 stars 47 forks source link

Visualize Temporal Taint Patterns #62

Open ESultanik opened 4 years ago

ESultanik commented 4 years ago

Now that we maintain temporal information for when specific bytes are operated on, it would be interesting (although perhaps not useful) to visualize it as an animated GIF.

  1. Represent the input file as an image where each pixel in the image represents an associated byte in the input file
  2. Allow the user to specify an output image height, width, or aspect ratio on the command line
  3. If the user does not specify a height, width, or aspect ratio, default to an aspect ratio of 1.618
    input_file_bytes: int = ...
    aspect_ratio: float = 1.618 # overridable via command line
    sqrt_ratio = math.sqrt(aspect_ratio)
    sqrt_bytes = math.sqrt(input_file_bytes)
    output_image_height: int = max(int(math.ceil(sqrt_bytes * sqrt_ratio)), 1)
    output_image_width: int = max(int(math.ceil(sqrt_bytes / sqrt_ratio)), 1)
  4. Here is an example of how to generate an animated gif from Python using Pillow.
  5. Each time the byte of an input file is operated on, the associated pixel should be highlighted. We could then have a cool down function that gradually fades out that pixel over a certain number of frames.
  6. If we also have temporal information regarding the context of how the bytes are used (e.g., if they affect control flow or not), then we can color pixels differently based upon that.
petervwyatt commented 4 years ago

Hi @ESultanik, I’ve been promoting something similar (a “heat map” of file reads) for a while… I see the following benefits:

carsonharmon commented 4 years ago

Hi Peter, thank you for the feedback.

Could you point me in the direction of a PDF that is malformed in a "fixable" way, and to a parser that will automatically perform this repair?

petervwyatt commented 4 years ago

Sure (thinking of something like no xref) - do you have a 'short-list' of parsers you prefer?

carsonharmon commented 4 years ago

Yes!

Parser short-list: MuPDF, QPDF, and Poppler.

Thank you.

petervwyatt commented 4 years ago

polytracker.zip I hand-made some samples for you by hex-editing down a PDF: no xref, no startxref, no trailer, etc. Filenames are descriptive of the malform. QPDF certainly outputs different warning messages so you should be able to capture via PolyTracker the additional recovery mechanisms that fire. MuPDF/poppler also supports some (but not all!) of these malforms. I also provided the baseline PDF too (...-original.pdf) to make 'diff-ing' the processing easier for you. Let me know if you want more samples.