Dataflow facts vanish after function call using PropagateOnto strategy

yuffon commented 1 year ago

I have an issue with the PropagateOnto strategy. If I use the default PropagateOver strategy, my typestate code works well. But when using PropagateOnto strategy, some dataflow facts vanish after invoking function.

Basically, I use an finite state machine describing pointers cannot be used after free. The analyzed code is

#include <stdio.h>
#include <malloc.h>

int use_after_free()
{
    int *p = (int *)malloc(16);
    *p = 1;
    free(p);
    *p = 2;
    return 0;
}

int main(void)
{

    int (* p)() = & use_after_free; 
    int i = p();

    return 0;
}

The IR file is

; ModuleID = 'func-pointer-use-after-free.c'
source_filename = "func-pointer-use-after-free.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 @use_after_free() #0 {
entry:
  %p = alloca i32*, align 8
  %call = call noalias i8* @malloc(i64 noundef 16) #2
  %0 = bitcast i8* %call to i32*
  store i32* %0, i32** %p, align 8
  %1 = load i32*, i32** %p, align 8
  store i32 1, i32* %1, align 4
  %2 = load i32*, i32** %p, align 8
  %3 = bitcast i32* %2 to i8*
  call void @free(i8* noundef %3) #2
  %4 = load i32*, i32** %p, align 8
  store i32 2, i32* %4, align 4
  ret i32 0
}

; Function Attrs: nounwind
declare dso_local noalias i8* @malloc(i64 noundef) #1

; Function Attrs: nounwind
declare dso_local void @free(i8* noundef) #1

; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 @main() #0 {
entry:
  %retval = alloca i32, align 4
  %p = alloca i32 (...)*, align 8
  %i = alloca i32, align 4
  store i32 0, i32* %retval, align 4
  store i32 (...)* bitcast (i32 ()* @use_after_free to i32 (...)*), i32 (...)** %p, align 8
  %0 = load i32 (...)*, i32 (...)** %p, align 8
  %call = call i32 (...) %0()
  store i32 %call, i32* %i, align 4
  ret i32 0
}

attributes #0 = { noinline nounwind optnone uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #1 = { nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #2 = { nounwind }

!llvm.module.flags = !{!0, !1, !2}
!llvm.ident = !{!3}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 7, !"uwtable", i32 1}
!2 = !{i32 7, !"frame-pointer", i32 2}
!3 = !{!"clang version 14.0.6 (https://github.com/llvm/llvm-project.git f28c006a5895fc0e329fe15fead81e37457cb1d1)"}

The dumped results of using PropagateOver strategy is

N: %p = alloca i32*, align 8, !psr.id !4 | ID: 0
------------------------------------------------
    D: Fact :{ Obj:11,state:0 } | V: BOTTOM
    D: Λ | V: BOTTOM

N: %call = call noalias i8* @malloc(i64 noundef 16) #2, !psr.id !5 | ID: 1
--------------------------------------------------------------------------
    D: Λ | V: BOTTOM
    D: Fact :{ Obj:11,state:0 } | V: BOTTOM

N: %0 = bitcast i8* %call to i32*, !psr.id !6 | ID: 2
-----------------------------------------------------
    D: Λ | V: BOTTOM
    D: Fact :{ Obj:11,state:1 } | V: BOTTOM

N: store i32* %0, i32** %p, align 8, !psr.id !7 | ID: 3
-------------------------------------------------------
    D: Λ | V: BOTTOM
    D: Fact :{ Obj:11,state:1 } | V: BOTTOM

N: %1 = load i32*, i32** %p, align 8, !psr.id !8 | ID: 4
--------------------------------------------------------
    D: Λ | V: BOTTOM
    D: Fact :{ Obj:11,state:1 } | V: BOTTOM

N: store i32 1, i32* %1, align 4, !psr.id !9 | ID: 5
----------------------------------------------------
    D: Λ | V: BOTTOM
    D: Fact :{ Obj:11,state:1 } | V: BOTTOM

N: %2 = load i32*, i32** %p, align 8, !psr.id !10 | ID: 6
---------------------------------------------------------
    D: Λ | V: BOTTOM
    D: Fact :{ Obj:11,state:1 } | V: BOTTOM

N: %3 = bitcast i32* %2 to i8*, !psr.id !11 | ID: 7
---------------------------------------------------
    D: Λ | V: BOTTOM
    D: Fact :{ Obj:11,state:1 } | V: BOTTOM

N: call void @free(i8* noundef %3) #2, !psr.id !12 | ID: 8
----------------------------------------------------------
    D: Λ | V: BOTTOM
    D: Fact :{ Obj:11,state:1 } | V: BOTTOM

N: %4 = load i32*, i32** %p, align 8, !psr.id !13 | ID: 9
---------------------------------------------------------
    D: Λ | V: BOTTOM
    D: Fact :{ Obj:11,state:2 } | V: BOTTOM

N: store i32 2, i32* %4, align 4, !psr.id !14 | ID: 10
------------------------------------------------------
    D: Fact :{ Obj:11,state:2 } | V: BOTTOM
    D: Λ | V: BOTTOM

N: ret i32 0, !psr.id !15 | ID: 11
----------------------------------
    D: Fact :{ Obj:11,state:3 } | V: BOTTOM
    D: Λ | V: BOTTOM

============ Results for function 'main' ============

N: %p = alloca i32 (...)*, align 8, !psr.id !17 | ID: 13
--------------------------------------------------------
    D: Fact :{ Obj:11,state:0 } | V: BOTTOM
    D: Λ | V: BOTTOM

N: %i = alloca i32, align 4, !psr.id !18 | ID: 14
-------------------------------------------------
    D: Λ | V: TOP
    D: Fact :{ Obj:11,state:0 } | V: TOP

N: store i32 0, i32* %retval, align 4, !psr.id !19 | ID: 15
-----------------------------------------------------------
    D: Fact :{ Obj:11,state:0 } | V: TOP
    D: Λ | V: TOP

N: store i32 (...)* bitcast (i32 ()* @use_after_free to i32 (...)*), i32 (...)** %p, align 8, !psr.id !20 | ID: 16
------------------------------------------------------------------------------------------------------------------
    D: Λ | V: TOP
    D: Fact :{ Obj:11,state:0 } | V: TOP

N: %0 = load i32 (...)*, i32 (...)** %p, align 8, !psr.id !21 | ID: 17
----------------------------------------------------------------------
    D: Λ | V: TOP
    D: Fact :{ Obj:11,state:0 } | V: TOP

N: %call = call i32 (...) %0(), !psr.id !22 | ID: 18
----------------------------------------------------
    D: Λ | V: BOTTOM
    D: Fact :{ Obj:11,state:0 } | V: BOTTOM

N: store i32 %call, i32* %i, align 4, !psr.id !23 | ID: 19
----------------------------------------------------------
    D: Λ | V: TOP
    D: Fact :{ Obj:11,state:3 } | V: TOP

N: ret i32 0, !psr.id !24 | ID: 20
----------------------------------
    D: Fact :{ Obj:11,state:3 } | V: TOP
    D: Λ | V: TOP

Everything is OK up to now. But with the propagateOnto strategy, the dumped results are


N: %p = alloca i32*, align 8, !psr.id !4 | ID: 0
------------------------------------------------
    D: Fact :{ Obj:11,state:0 } | V: BOTTOM
    D: Λ | V: BOTTOM

N: %call = call noalias i8* @malloc(i64 noundef 16) #2, !psr.id !5 | ID: 1
--------------------------------------------------------------------------
    D: Fact :{ Obj:11,state:1 } | V: BOTTOM
    D: Λ | V: BOTTOM

N: %0 = bitcast i8* %call to i32*, !psr.id !6 | ID: 2
-----------------------------------------------------
    D: Fact :{ Obj:11,state:1 } | V: BOTTOM
    D: Λ | V: BOTTOM

N: store i32* %0, i32** %p, align 8, !psr.id !7 | ID: 3
-------------------------------------------------------
    D: Λ | V: BOTTOM
    D: Fact :{ Obj:11,state:1 } | V: BOTTOM

N: %1 = load i32*, i32** %p, align 8, !psr.id !8 | ID: 4
--------------------------------------------------------
    D: Fact :{ Obj:11,state:1 } | V: BOTTOM
    D: Λ | V: BOTTOM

N: store i32 1, i32* %1, align 4, !psr.id !9 | ID: 5
----------------------------------------------------
    D: Λ | V: BOTTOM
    D: Fact :{ Obj:11,state:1 } | V: BOTTOM

N: %2 = load i32*, i32** %p, align 8, !psr.id !10 | ID: 6
---------------------------------------------------------
    D: Fact :{ Obj:11,state:1 } | V: BOTTOM
    D: Λ | V: BOTTOM

N: %3 = bitcast i32* %2 to i8*, !psr.id !11 | ID: 7
---------------------------------------------------
    D: Fact :{ Obj:11,state:1 } | V: BOTTOM
    D: Λ | V: BOTTOM

N: call void @free(i8* noundef %3) #2, !psr.id !12 | ID: 8
----------------------------------------------------------
    D: Λ | V: BOTTOM
    D: Fact :{ Obj:11,state:2 } | V: BOTTOM

N: %4 = load i32*, i32** %p, align 8, !psr.id !13 | ID: 9
---------------------------------------------------------
    D: Fact :{ Obj:11,state:2 } | V: BOTTOM
    D: Λ | V: BOTTOM

N: store i32 2, i32* %4, align 4, !psr.id !14 | ID: 10
------------------------------------------------------
    D: Fact :{ Obj:11,state:3 } | V: BOTTOM
    D: Λ | V: BOTTOM

============ Results for function 'main' ============

N: %p = alloca i32 (...)*, align 8, !psr.id !17 | ID: 13
--------------------------------------------------------
    D: Fact :{ Obj:11,state:0 } | V: TOP
    D: Λ | V: TOP

N: %i = alloca i32, align 4, !psr.id !18 | ID: 14
-------------------------------------------------
    D: Λ | V: TOP
    D: Fact :{ Obj:11,state:0 } | V: TOP

N: store i32 0, i32* %retval, align 4, !psr.id !19 | ID: 15
-----------------------------------------------------------
    D: Λ | V: TOP
    D: Fact :{ Obj:11,state:0 } | V: TOP

N: store i32 (...)* bitcast (i32 ()* @use_after_free to i32 (...)*), i32 (...)** %p, align 8, !psr.id !20 | ID: 16
------------------------------------------------------------------------------------------------------------------
    D: Λ | V: TOP
    D: Fact :{ Obj:11,state:0 } | V: TOP

N: %0 = load i32 (...)*, i32 (...)** %p, align 8, !psr.id !21 | ID: 17
----------------------------------------------------------------------
    D: Λ | V: TOP
    D: Fact :{ Obj:11,state:0 } | V: TOP

N: %call = call i32 (...) %0(), !psr.id !22 | ID: 18
----------------------------------------------------
    D: Λ | V: TOP

N: store i32 %call, i32* %i, align 4, !psr.id !23 | ID: 19
----------------------------------------------------------
    D: Λ | V: TOP

As shown above, the last two instructions after calling the function have no dataflow facts. It seems that dataflow facts do not come out from the invoked function. In fact, I use identity flow function for all return edges. Besides, the default PropagateOver strategy works well, so it may not be the fault of flow functions.

fabianbs96 commented 1 year ago

Hi @yuffon, yes that sounds like a bug. Can you provide the analysis code, such that we can debug it?

yuffon commented 1 year ago

Hi @yuffon, yes that sounds like a bug. Can you provide the analysis code, such that we can debug it?

My project relies on other components. I am constructing a simple case and will report it later.

yuffon commented 1 year ago

Hi @yuffon, yes that sounds like a bug. Can you provide the analysis code, such that we can debug it?

Hi @fabianbs96, I am isolating my typestate analysis from other parts of my project so that it can run independently. But I find something may be useful.

My flow functions are designed as follows.

Normal Flow Function: for sensitive events -> push finite state automata other cases -> identity flow function

Call flow function: sensitive events (where function calls are sensitive events) -> kill all facts other cases -> identity (dataflow enter normal functions)

call to ret flow functions: ensitive events -> push automata other cases -> kill all facts (data flows are processed by function body)

return flow functions: identity flow function

Currently, these flow functions works well for Propagate Over strategy. But data flow facts vanishes after funciton call in the above example when using Propagate Onto strategy.

However, if I change the call to ret flow function as follows:

call to ret flow functions: ensitive events -> push automata other cases -> identity (change here, from killall to identity)

In the above example, the dataflow results after calling use_after_free function include two facts:


N: store i32 0, i32* %retval, align 4, !psr.id !19 | ID: 15
-----------------------------------------------------------
    D: Fact :{ Obj:11,state:0 } | V: TOP
    D: Λ | V: TOP

N: %call = call i32 @use_after_free(), !psr.id !20 | ID: 16
-----------------------------------------------------------
    D: Λ | V: TOP
    D: Fact :{ Obj:11,state:0 } | V: TOP
    D: Fact :{ Obj:11,state:3 } | V: TOP

N: store i32 %call, i32* %i, align 4, !psr.id !21 | ID: 17
----------------------------------------------------------
    D: Λ | V: TOP
    D: Fact :{ Obj:11,state:0 } | V: TOP
    D: Fact :{ Obj:11,state:3 } | V: TOP

These results conform to flow functions.

fabianbs96 commented 1 year ago

Hi @yuffon, thatks for the details. Based on your description, I have some comments on the handling of function calls (not sure whether you already handle it like this):

The call-flow function is responsible for mapping the arguments from the call-site to the formal parameters of the called function. Globals are usually passed as identity and all other facts are killed.
The return-flow function basically computes the inverse of the call-flow function: It maps the returned value to the call-site and maps all pointer-parameters (that may have changed within the callee) back to the respective arguments at the call-site. All other facts are killed.
The call-to-ret flow function then handles the propagation of all facts that are unrelated to the call (and all non-pointer arguments due to call-by-value) via identity and kills all others. Sometimes, you also perform special computations here, but usually that's done by the summary-flow function.
The summary-flow function is used to model the effects of special functions to the dataflow facts. This can be used for handling source/sink functions in a taint analysis, or modeled API functions in a typestate analysis (to advance in the automaton). Note, that the call-flow function will not be called if you return a non-null flow function from getSummaryFlowFunction and the analysis of the called function will be skipped; in such cases this is usually, what you want

From your description, I cannot see whether you do the parameter mapping correctly in your flow functions. Otherwise your approach sounds reasonable

yuffon commented 1 year ago

Hi @yuffon, thatks for the details. Based on your description, I have some comments on the handling of function calls (not sure whether you already handle it like this):

The call-flow function is responsible for mapping the arguments from the call-site to the formal parameters of the called function. Globals are usually passed as identity and all other facts are killed.

The return-flow function basically computes the inverse of the call-flow function: It maps the returned value to the call-site and maps all pointer-parameters (that may have changed within the callee) back to the respective arguments at the call-site. All other facts are killed.

The call-to-ret flow function then handles the propagation of all facts that are unrelated to the call (and all non-pointer arguments due to call-by-value) via identity and kills all others. Sometimes, you also perform special computations here, but usually that's done by the summary-flow function.

The summary-flow function is used to model the effects of special functions to the dataflow facts. This can be used for handling source/sink functions in a taint analysis, or modeled API functions in a typestate analysis (to advance in the automaton). Note, that the call-flow function will not be called if you return a non-null flow function from getSummaryFlowFunction and the analysis of the called function will be skipped; in such cases this is usually, what you want

From your description, I cannot see whether you do the parameter mapping correctly in your flow functions. Otherwise your approach sounds reasonable

Thanks a lot for your detailed comments. I used a simple and lazy implementation. I will revise it and check again.

secure-software-engineering / phasar

Dataflow facts vanish after function call using PropagateOnto strategy #679