VMMDLL_Scatter_ExecuteRead Question

americanhawk1 commented 1 year ago

I'm wondering if my scatter reads times are too long...

I'm using a DMA card, and I know it will be dependent on the speed of the card you're using(and probably some other things). I'm not sure if probe is comparable to scatter, but I get an average of 430MB/s.

Right now I'm doing 4 scatter reads. Each of these are reading 2,700 addresses, for a total of 10800 reads. They unfortunately can't be concurrent(existing all in one prepare array), as each of the scatter reads relies on the last one one to succeed.

The total time for these 4 reads are about 1 - 1.5 seconds.

ufrisk commented 1 year ago

it's not comparable.

what speed do you get if running pcileech.exe dump ?

Also what is the size of your reads?

americanhawk1 commented 1 year ago

I get about 18MB/s for my dump speed.

I'm reading 8 bytes every read, for a total of 86400 bytes.

ufrisk commented 1 year ago

That is slow. Are you by any chance running it on USB2, i.e. usb2 cable or usb2 port? Sometimes it helps reconnecting the cable if its a usb3 cable you're using.

americanhawk1 commented 1 year ago

I'm using a USB3.0 cable that I use for VR (I also tried 2 others cords I had laying around). I've tried plugging it into USB3.0 and 3.1 ports without a change. The fastest speeds I get are around 20MB/s.

Would this cause slow read speeds that I'm seeing with the API? I have a friend saying that he's getting 300+MB/s when running his dump command.

ufrisk commented 1 year ago

This would affect your read performance unfortunately.

It may be a bad connection to your device. Sometimes the plug doesn't go all the way in. It may help to adjust the plate a tiny bit.

I'm not sure this will help but the culprit is your slow speed.

Also, can you do pcileech.exe dump -device fpga -v

And see what it says? If its not saying anything about "tiny algorithm" its great.

americanhawk1 commented 1 year ago

DEVICE: FPGA: ScreamerM2 PCIe gen2 x1 [300,25,500] [v4.11,0400] [ASYNC,NORM]
FPGA: TINY PCIe TLP algrithm auto-selected!
Memory Dump: Initializing ... Done.

ufrisk commented 1 year ago

This is not a usb issue.

It seems like your mainboard does not support the larger 4kB reads that pcileech use by default so it uses the fallback algorithm.

It may help to do a bios update or it may not help. It may help changing slots, change setting or not.

I plan to look into improving the speed on the tiny algorithm in the medium future, but unfortunately it will never reach the full speed of the normal algorithm.

americanhawk1 commented 1 year ago

Doing a BIOS update seems to have fixed it. I'm getting 200+ MB/s on the dump now.

This also seems to have fixed me having to use "fpga://algo=2". I've switched back to algo 0.

I definitely see improvement in read speeds, but it's still not as fast as I was looking for. This could absolutely be on me with unoptimized code, or not understanding the limits of my card or your framework. If I had to make an educated guess, it's probably me expecting too much out of DMA.

If you're curious, this is one of the function overloads I use to implement scatter reads:

/*
* -- PID = process ID of process to read memory
* -- AddrVector = array of addresses to read from
* -- Offset = added to AddrVector.at() value
* -- Array = array to write to, after read is completed
*/
bool TestUniversalScatterRead(VMMDLL_SCATTER_HANDLE& hScatter, DWORD& PID, std::vector<uintptr_t>& AddrVector, uintptr_t Offset, _Out_ std::vector<uintptr_t>& Array)
{
    for (int i = 0; i < AddrVector.size(); i++)
    {
        if (!VMMDLL_Scatter_Prepare(hScatter, AddrVector.at(i) + Offset, sizeof(AddrVector)))
        {
            VMMDLL_Scatter_CloseHandle(hScatter);
            return false;
        }
    }

    if (!VMMDLL_Scatter_ExecuteRead(hScatter))
    {
        VMMDLL_Scatter_Clear(hScatter, PID, NULL);
        VMMDLL_Scatter_CloseHandle(hScatter);
        return false;
    }

    for (int i = 0; i < AddrVector.size(); i++)
    {
        uintptr_t tempInt;
        if (!VMMDLL_Scatter_Read(hScatter, AddrVector.at(i) + Offset, sizeof(uintptr_t), (PBYTE)&tempInt, NULL))
        {
            VMMDLL_Scatter_Clear(hScatter, PID, NULL);
            VMMDLL_Scatter_CloseHandle(hScatter);
        }
        Array.push_back(tempInt);
    }

    VMMDLL_Scatter_Clear(hScatter, PID, NULL);
    return true;
}

This is being called 4 different times to grab the values I need. I used to call this every iteration of my while(true) loop, but now I'm only calling it every 60 iterations. It's still fairly slow for what I need it for.

EDIT:

So, I've run some tests to see how fast these scatter reads are talking to complete. All in all, they seem to be taking a lot less time than I thought they were.

Just for a recap, each of these are reading on average, 2,700 addresses (8 bytes each) per scatter read. In the test results I'm going to post below, I'm doing 4 scatter reads, reading the same amount of addresses per read as I mentioned before.

On average, the reads take about 30-50 m/s to complete, average. I noticed that some iterations were taking longer than others, so I decided to also log those.

If I'm doing my math correctly, that would be 86.4 KB/s of data, every iteration. Some of the time will be spent doing other things that are listed in the function above, but according to Visual Studio performance metrics, the bulk of CPU time is spent in VMMDLL_Scatter_ExecuteRead.

I guess this is fairly fast?

ufrisk commented 1 year ago

I think it would be good to specify the flag VMMDLL_FLAG_NOCACHE in your VMMDLL_Scatter_Initialize and VMMDLL_Scatter_Clear. I don't see the initialize call but in the clear call you don't specify nocache, which means that cache will be used next time the handle is used to perform a read.

ufrisk / pcileech

VMMDLL_Scatter_ExecuteRead Question #235