Open xlauko opened 2 years ago
I think the best approach would be to design a generic transformation, that given a typedef name and some middle level type (e.g., ml::Time
) it replaces the high-level typedef. So it is preserved during lowering of the remaining types.
If this is the approach, then also think through the reverse, i.e. upgrading an int
into an "error code type," e.g. from int
to errno_t
, or as in OpenSC, to some project-specific type.
The general mindset I'm espousing is "typedef as taint."
We should have UI (probably python API) for people to create these kind of type-bindings.
Agreed on both. I think at a minimum, the UI level support could be UI for visualizing code, and then where you click initializes a here
global variable available to a python console, wherein you have the API access.
What do you think should happen when you have:
time_t time;
int casted = (int)time;
do we want to propagate "type-taint" information through casts? In the worst case through memory?
I don't want VAST to propagate type taint, I want VAST to enable me to do so. I want to find the uses of values of type time_t
, and then if I see them casted to an int
(implicitly or explicitly), then I want to be able to then do my own taint tracking, and possibly transformation / cloning+type changing.
Conrast this with LLVM: I'd always have i32
, and I'd need to observe and taint time_t
from the source, i.e. calls to time()
or something like that. It's possible that at a high level, there are substantially more taint sources, because ever use of a time_t
variable is a time_t
. At lowering time, we might lower to an i32
, but I want to know "we came from a time_t
.
To be more precise, I don't want to end up exactly where we are with LLVM, just with more control-flow structure. I think typedefs carry a lot of semantic value, although I also recognize just how much of an annoyance they can be. Enumerators are similar, as are macro constants. When we drop down to fixed-sized types, I want to ensure that we have either i) solid provenance info to know about typedefs and such, or ii) some kind of extra stuff carried along that also isn't so unwieldy that you'll cry tears of pain every time you add operators.
I keep bringing up the lowering stuff because I know that not everything is in SourceIR; some stuff will only happen through lowering (destructors, etc.).
Rethink
time_t
tracking, and just how "deep" down the stack we can reasonably keep it around or ensure that we know a giveni32
is actually atime_t
.