trailofbits / vast

VAST is an experimental compiler pipeline designed for program analysis of C and C++. It provides a tower of IRs as MLIR dialects to choose the best fit representations for a program analysis or further program abstraction.
https://trailofbits.github.io/vast/
Apache License 2.0
397 stars 24 forks source link

`time_t` tracking #62

Open xlauko opened 2 years ago

xlauko commented 2 years ago

Rethink time_t tracking, and just how "deep" down the stack we can reasonably keep it around or ensure that we know a given i32 is actually a time_t.

xlauko commented 2 years ago

I think the best approach would be to design a generic transformation, that given a typedef name and some middle level type (e.g., ml::Time) it replaces the high-level typedef. So it is preserved during lowering of the remaining types.

pgoodman commented 2 years ago

If this is the approach, then also think through the reverse, i.e. upgrading an int into an "error code type," e.g. from int to errno_t, or as in OpenSC, to some project-specific type.

pgoodman commented 2 years ago

The general mindset I'm espousing is "typedef as taint."

xlauko commented 2 years ago

We should have UI (probably python API) for people to create these kind of type-bindings.

pgoodman commented 2 years ago

Agreed on both. I think at a minimum, the UI level support could be UI for visualizing code, and then where you click initializes a here global variable available to a python console, wherein you have the API access.

xlauko commented 2 years ago

What do you think should happen when you have:

time_t time;
int casted = (int)time;

do we want to propagate "type-taint" information through casts? In the worst case through memory?

pgoodman commented 2 years ago

I don't want VAST to propagate type taint, I want VAST to enable me to do so. I want to find the uses of values of type time_t, and then if I see them casted to an int (implicitly or explicitly), then I want to be able to then do my own taint tracking, and possibly transformation / cloning+type changing.

Conrast this with LLVM: I'd always have i32, and I'd need to observe and taint time_t from the source, i.e. calls to time() or something like that. It's possible that at a high level, there are substantially more taint sources, because ever use of a time_t variable is a time_t. At lowering time, we might lower to an i32, but I want to know "we came from a time_t.

pgoodman commented 2 years ago

To be more precise, I don't want to end up exactly where we are with LLVM, just with more control-flow structure. I think typedefs carry a lot of semantic value, although I also recognize just how much of an annoyance they can be. Enumerators are similar, as are macro constants. When we drop down to fixed-sized types, I want to ensure that we have either i) solid provenance info to know about typedefs and such, or ii) some kind of extra stuff carried along that also isn't so unwieldy that you'll cry tears of pain every time you add operators.

pgoodman commented 2 years ago

I keep bringing up the lowering stuff because I know that not everything is in SourceIR; some stuff will only happen through lowering (destructors, etc.).