secure-software-engineering / phasar

A LLVM-based static analysis framework.
Other
933 stars 140 forks source link

leak in llvm code generated by rust notfound: functions with String return parameter #643

Closed StamesJames closed 11 months ago

StamesJames commented 1 year ago

Bug description

I'm trying to find leaks in llvm code generated with rust for the following programm:

#[inline(never)]
#[no_mangle]
fn source() -> String {
    "Test".to_string()
}

#[inline(never)]
#[no_mangle]
fn sink(source: &str) -> String {
    source.to_string()
}

#[inline(never)]
#[no_mangle]
fn sanitize(source: &str) -> String {
    source.to_owned()
}

fn main() {
    let unsanitized = source();
    let source = source();
    let sanitized = sanitize(&source);
    let sink_unsanitized = sink(&unsanitized);
    let sink_sanitized = sink(&sanitized);
    println!("{sink_unsanitized}");
    println!("{sink_sanitized}");
}

A simpler example worked ( #642 ) now I changed the functions from returning ints to returning Strings. They get compiled to the following llvm code:

; Function Attrs: noinline nonlazybind uwtable
define dso_local void @source(%"alloc::string::String"* sret(%"alloc::string::String") %0) unnamed_addr #1 {
start:
; call <str as alloc::string::ToString>::to_string
  call void @"_ZN47_$LT$str$u20$as$u20$alloc..string..ToString$GT$9to_string17h488739110bf80537E"(%"alloc::string::String"* sret(%"alloc::string::String") %0, [0 x i8]* align 1 bitcast (<{ [4 x i8] }>* @alloc63 to [0 x i8]*), i64 4)
  br label %bb1

bb1:                                              ; preds = %start
  ret void
}

; Function Attrs: noinline nonlazybind uwtable
define dso_local void @sink(%"alloc::string::String"* sret(%"alloc::string::String") %0, [0 x i8]* align 1 %source.0, i64 %source.1) unnamed_addr #1 {
start:
; call <str as alloc::string::ToString>::to_string
  call void @"_ZN47_$LT$str$u20$as$u20$alloc..string..ToString$GT$9to_string17h488739110bf80537E"(%"alloc::string::String"* sret(%"alloc::string::String") %0, [0 x i8]* align 1 %source.0, i64 %source.1)
  br label %bb1

bb1:                                              ; preds = %start
  ret void
}

I set my analysis-config to:

{
    "name": "taint-03-simple-functions-string",
    "version": 1,
    "functions": [
        {
            "name": "source",
            "params": {
                "source": [0]
            }
        },
        {
            "name": "sink",
            "params": {
                "sink": [1]
            }
        },
        {
            "name": "sanitize",
            "ret": "sanitizer"
        }
    ],
    "variables": []
  }

because in my understanding the two functions now don't return anything but get a pointer to which they write the value to return. I Invoke my analysis with

phasar-cli \
   -m target/debug/deps/sql_injection_03_simple_requests-0a2c4db10e6afc34.ll \
   -D ifds-taint \
   --analysis-config=analysis-config.json \
   --entry-points _ZN32sql_injection_03_simple_requests4main17h3819e5f83b074069E

Where _ZN32sql_injection_03_simple_requests4main17h3819e5f83b074069E is the mangled name of my main function.

If I set the 0th parameter of the sink function as sink, phasar reports a leak but it's not simply the leaked variable obtained by the source function but some very long description. Here the first lines of that

Leak(s):
IR  : %"core::fmt::Arguments"* %0 | ID: _ZN4core3fmt9Arguments6new_v117hc8a21f4658044cffE.0
IR  : %"alloc::string::String"* %0 | ID: _ZN5alloc6string6String19from_utf8_unchecked17h6553b59f13851d7cE.0
IR  : %"alloc::vec::Vec<u8>"* %bytes | ID: _ZN5alloc6string6String19from_utf8_unchecked17h6553b59f13851d7cE.1
IR  : @alloc55 = private unnamed_addr constant <{ [75 x i8] }> <{ [75 x i8] c"/rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/fmt/mod.rs" }>, align 1, !psr.id !4 | ID: 4
IR  : @alloc59 = private unnamed_addr constant <{ [74 x i8] }> <{ [74 x i8] c"/rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/alloc.rs" }>, align 1, !psr.id !8 | ID: 8
IR  : @alloc60 = private unnamed_addr constant <{ i8*, [16 x i8] }> <{ i8* getelementptr inbounds (<{ [74 x i8] }>, <{ [74 x i8] }>* @alloc59, i32 0, i32 0, i32 0), [16 x i8] c"J\00\00\00\00\00\00\00\AC\00\00\00\1B\00\00\00" }>, align 8, !psr.id !9 | ID: 9
IR  : @alloc61 = private unnamed_addr constant <{ [76 x i8] }> <{ [76 x i8] c"/rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/raw_vec.rs" }>, align 1, !psr.id !10 | ID: 10
IR  : @alloc62 = private unnamed_addr constant <{ i8*, [16 x i8] }> <{ i8* getelementptr inbounds (<{ [76 x i8] }>, <{ [76 x i8] }>* @alloc61, i32 0, i32 0, i32 0), [16 x i8] c"L\00\00\00\00\00\00\00\F7\00\00\00;\00\00\00" }>, align 8, !psr.id !11 | ID: 11
IR  : @alloc22 = private unnamed_addr constant <{ [1 x i8] }> <{ [1 x i8] c"\0A" }>, align 1, !psr.id !13 | ID: 13
IR  : @alloc21 = private unnamed_addr constant <{ i8*, [8 x i8], i8*, [8 x i8] }> <{ i8* bitcast (<{}>* @alloc20 to i8*), [8 x i8] zeroinitializer, i8* getelementptr inbounds (<{ [1 x i8] }>, <{ [1 x i8] }>* @alloc22, i32 0, i32 0, i32 0), [8 x i8] c"\01\00\00\00\00\00\00\00" }>, align 8, !psr.id !14 | ID: 14
IR  : %_2 = call i8* @"_ZN4core3ptr6unique15Unique$LT$T$GT$6as_ptr17h3b210c5ac01b064fE"(i8* %unique), !psr.id !18 | ID: 15
IR  : i8* %unique | ID: _ZN119_$LT$core..ptr..non_null..NonNull$LT$T$GT$$u20$as$u20$core..convert..From$LT$core..ptr..unique..Unique$LT$T$GT$$GT$$GT$4from17hd493d251c602c8e8E.0
IR  : %0 = call i8* @"_ZN4core3ptr8non_null16NonNull$LT$T$GT$13new_unchecked17h6f1d783941022635E"(i8* %_2), !psr.id !20 | ID: 17

But in my understanding the 0th parameter is no sink parameter because it acts as the return value but the 1st and 2nd should produce a leak because here values from inside the source String get passed. I attached all relevant files.

Steps to reproduce

Actual result: Describe here what happens after you run the steps above (i.e. the buggy behaviour)

Expected result: Describe here what should happen after you run the steps above (i.e. what would be the correct behaviour)

Context (Environment)

Operating System:

Build Type:

Example files

Files:

examplefiles.zip

MMory commented 1 year ago

Hi @StamesJames,

it's good that someone is letting phasar analyze some Rust code.

With the files you provided I am unable to compile the sample, as cargo wants a Cargo.toml and I am a Rust noob not knowing where I would get that from. I think it would be the easiest for me if you could provide me the full IR file that you try to analyze.

Cheers Martin

MMory commented 1 year ago

Correction: I followed your instructions in the other issue and was able to build your example. Will look into it now.

MMory commented 1 year ago

Another correction: my rustc/cargo build IR for LLVM >14, which phasar cannot analyze. Please provide your IR file :)

StamesJames commented 1 year ago

Hi @MMory

sorry I wrote the issue a bit in a rush. Here is the corrected version example_files.zip The IR is in the root folder now. It also should work now to cargo build inside the root folder. The right rust version is specified in the rust-toolchain.toml file and the compiler options to build the IR inside the target/debug/deps folder are specified in the .cargo/config.toml file.

MMory commented 1 year ago

Hi @StamesJames, in case you didn't notice: we merged a fix that should address your issue.

MMory commented 1 year ago

Hi @StamesJames, could you please provide feedback w.r.t. the fix we merged on Jul 31?

StamesJames commented 1 year ago

Hi @MMory, yes ofcourse. I was able to find the leak with the newest version on the development branch I build as a docker image. I will try to find leaks in more complex examples now.