trailofbits / binary_type_inference

GNU General Public License v3.0
11 stars 0 forks source link

Implement Function Cloning for Polymorphism #22

Closed 2over12 closed 1 year ago

2over12 commented 2 years ago

In mooosl several pointers are getting misunified, ie. byte buffers are unified with the pointers to nodes. This is because both of these pointers reach free param 1 and are returned from calloc. These pointer types are then being unified, causing mistyping of byte buffers. Solving this requires callsite tagging of formal parameters in order to support polymorphic types of functions such as free and calloc.

2over12 commented 2 years ago

First step here is to develop an exceedingly simple polymorphic test

2over12 commented 2 years ago

Ok so we are going to need some tricks to support polymorphism:

  1. At callsites we need to copy the reachable subgraph for related nodes from callees, providing a form of polymorphism. Ideally we just introduce a copy of the related subgraph into the space, then introduce some new propagation/unification relations (this could get expensive maybe??? Not worse than function cloning everything presumably so).

  2. Since there is no binding to the callsite we need some sort of refinement heuristic for parameters when compatibility exists. This heuristic is described a bit in G.1. The idea is based on all callsites of a function rather than giving the function the generic scheme, give it the most specific type that is compatible with its uses. Maybe we can group callsites that are compatible to prevent overgeneralization, the real question is how to heuristically share structure information

  3. Coffee

2over12 commented 2 years ago

So there are two important tests we should have here corresponding to the two points above:

  1. A test where we compose the identity function with an aliasing function (ie. a function that just calls it) both the identity and aliasing function should be found entirely polymorphic and adding additional constraints with subtyping char or int should both work fine instantiating it ie. f calling with char and returning should get returning char and g calling int should get that. Heuristics would ideally type id as a void -> void* (Actually is this true, is it better to just produce a union of reaching lowered types?)

  2. a function a and function b that both access compatible fields of a structure, ie. f1 will be at 0 and size 32 with f2 at 4 size 32. Then a function c and d should call both a and b. Function a should be polymorphic in any structure with a field at 0 and function b should polymorphic with any field at 4. Function c should be typed to have both a field at 0 and at 4. Function d should do similar things to c. During heuristics we should recognize that all callsite types are compatible for a and b and c and d unifying it into a single structure type

2over12 commented 2 years ago

This PR due to the requirement for unions in the heuristics will likely also handle #20

2over12 commented 2 years ago

2111737e5cb3f404c85916119dd6b35bb05d4fb8 implements part 1 of testing where calling functions arent unified through the identity function. Now we need to figure out how to implement 2 in heuristics. This feature will need a representation for sketchgraphs where there are multiple sketches with weakly equivalent types