ufrisk / MemProcFS

MemProcFS
GNU Affero General Public License v3.0
2.8k stars 352 forks source link

Rust API: VmmYaraResult.result is empty even if VmmYaraResult.total_results>0. #279

Closed kaarposoft closed 3 months ago

kaarposoft commented 3 months ago

I have run into what I believe is a bug in the Rust API. I am using version 5.9.3 (Linux) on Ubuntu 22.04 and have not tried this on other versions.

VmmYara::poll and VmmYara::result both return a VmmYaraResult. However, even if VmmYaraResult.total_results shows a value greater than zero, the VmmYaraResult.result vector is empty!

First I have tried running from the command-line: /usr/local/memprocfs/memprocfs -device memory.img -mount mnt2 -forensic 2 -license-accept-elastic-license-2-0 MemprocFS finds 13 yara mathes as shown in mnt2/forensic/findevil/yara.txt

In rust I initialize the Vmm with:

let args = ["-loglevel", "4", "-device", &m].to_vec();
let vmm = Vmm::new(lib.to_string_lossy().as_ref(), &args).context("Vmm::new")?;

And the yara code is:

    let mut search = vmm.search_yara(rules, 0, max_addr, 0x10000, FLAG_NOCACHE)?;
    info!("Start YARA search");
    search.start();
    info!("Started YARA search");
    /*
    let res = loop {
        sleep(Duration::from_secs(1));
        info!("Polling YARA search");
        let res = search.poll();
        let progress_pct = (100*res.addr_current)/max_addr;
        info!(res.total_read_bytes, res.total_results, len=res.result.len(), progress_pct, "Yara progress");
        warn!(%res, "Poll result");
        if res.is_completed { break res}
    };
    */
    let res = search.result();
    if !res.is_completed_success {
        error!("YARA FAILED"); // consider to return early
    }
    info!(res.total_results, len=res.result.len(), "got results");

And the output: INFO ez::mem::yara: got results res.total_results=15 len=0

I have also tried without FLAG_NOCACHE, but still no results. I have also tried to add "-forensic", "2" to the args to Vmm::new, but still no results.

I am not very experienced in the MemProcFS code, but this looks suspicious: https://github.com/ufrisk/MemProcFS/blob/8b05b89cfe1fd77af8341d0feffaee3f47b682b4/vmmrust/memprocfs/src/lib_memprocfs.rs#L7398

fn impl_start(&mut self) {
        if self.is_started == false {
            self.is_started = true;
            // ugly code below - but it works ...
            self.native_search.pvUserPtrOpt = std::ptr::addr_of!(self.result) as usize;
            let pid = self.pid;
            let native_h = self.vmm.native.h;
            let pfn = self.vmm.native.VMMDLL_MemSearch;
            let ptr = &mut self.native_search as *mut CVMMDLL_MEM_SEARCH_CONTEXT;
            let ptr_wrap = ptr as usize;
            let thread_handle = std::thread::spawn(move || {
                let ptr = ptr_wrap as *mut CVMMDLL_MEM_SEARCH_CONTEXT;
                (pfn)(native_h, pid, ptr, std::ptr::null_mut(), std::ptr::null_mut())
            });
            self.thread = Some(thread_handle);
        }
    }
ufrisk commented 3 months ago

Thank you for reporting this. It should now be updated in Version 5.9.4 on crates.io.

Issue was that I changed the native struct layout some time ago in the native library. I forgot to update the Rust API wrapper so it terminated parsing of the yara results for safety reasons due to a struct version number mismatch.

A good find, Thank You for reporting it. Can you verify this new version works better?

kaarposoft commented 3 months ago

Thanks for trying. However, I see exactly the same missing results wit 5.9.4

ufrisk commented 3 months ago

It seems to be working for me. Are you using 5.9.4 from crates.io (and not an older version such as 5.9.0)? Also are you using the latest native library (5.9.4)? I assume yara itself is working since you seem to get results.

This is how it looks in my example MZ header yara search in the example file when running this. image

kaarposoft commented 3 months ago

Let me investigate further. It seems there IS a difference between 5.9.3 and 5.9.4. Probably my bad. Dont't waste time on this until I revert with more info.

kaarposoft commented 3 months ago

Yes, I am using 5.9.4 from crates and the latest native library (confirmed with the new -version option!)

I am getting SOME yara results now, but there is a discrepancy between the reported total_results and the actual length of the results. Just to be sure, I even wait 10 seconds for the final result. My code is:

let res = loop {
        sleep(Duration::from_secs(1));
        //info!("Polling YARA search");
        let res = search.poll();
        let progress_pct = (100*res.addr_current)/max_addr;
        info!(res.total_read_bytes, res.total_results, len=res.result.len(), progress_pct, "Yara progress");
        //warn!(%res, "Poll result");
        if res.is_completed { break res}
    };

    for _ in 0..10 {
        sleep(Duration::from_secs(1));
        //info!("Waiting longer for YARA search");
        let res = search.poll();
        let progress_pct = (100*res.addr_current)/max_addr;
        info!(res.total_read_bytes, res.total_results, len=res.result.len(), progress_pct, "Yara waiting progress");
        }

This gives me: image

So it would seem that 1)The results are not available until the seach is complete. I had expected otherwise, but no problem, I can work around that, 2) The reported number of results (here: 19) is larger than the length of the result (here: 10)

I am not subject matter expert, but it seems strange that the reported number of results does not match the length of the result.

ufrisk commented 3 months ago

Regarding the search result mis-match. It seems like I'm counting the number of matched addresses. One rule may match multiple addresses. I suspect this is what's going on in your case here. I should probably fix this. The user would expect number of matched rules in this variable rather than number of matched addresses. I'd have to update the native library to 5.9.5 for that though (and also perform some testing).

Regarding poll not returning the search results. It however implies that a deep clone would take place at each poll. On searches with plentiful results (and complex rules) this would imply that the poll would go from a very lightweight operation to a very heavy operation. I'm not sure if I really wish to add that in; but if it's really really needed I guess I could do it. Or add some user config for it...

ufrisk commented 3 months ago

Update: I probably can't easily update the behavior of the native library with regards to the result count. I'll update the rust library though. If not today already at least before the end of the week.

As for the poll, if not a huge issue, maybe leave it as-is?

kaarposoft commented 3 months ago

The behavior during polling is fine. No need to deep clone during polling. I can certainly live with results only being available when polling is done. No problem at all.

However, the number of matches should be consistent. I would prefer number of matches, not number of addresses. But whichever you choose, it just needs to be consistent in the API.

ufrisk commented 3 months ago

The number of matches should now be updated to match in 5.9.5.