Closed false closed 4 years ago
It seems to be a problem with the read scatered impl itself. If I make one call of vmmRead for 2MB then data will be only zeros after 0x1FF0 (so beginning 0x2000). If I make successive calls of small vmmRead the data Is not corrupted. I'm currently trying to find the issue in the read impl, but didn't find anything issue there so far unfortunately.
Could it be a problem with TLP_CallbackMRd_Scatter and tiny algorithm? I see the good number of tx tlps and well built in DeviceFPGA_ReadScatterMEM_Impl (except I am not sure of the behaviour when a lot of requests got the same tag), but it seems there is no corresponding rx for a lot of the tx requests ; seems to fail on the eccbit. Just for info I've quickly dirty tested to run 100 vmmRead of 0x2000 instead of one of 0x200000 and it makes the VmmWinInit_FindNtosScanHint64 work (and after that it fails on the find eprocess because it also has a big request).
Thanks for the excellent debugging. This makes some sense to me and I'll be looking into a fix in the coming few days.
No problem ; Do you need any precise steps to reproduce the error or is that clear enough ?
No, I've located the error and fixed it in my lab. I just need to do more testing before I push the fix into production. I expect this to be completed on Monday at the very latest.
Great! Can't wait to test it on my project! :-)
This should now be fixed. Can you verify this and close the issue if it's resolved.
You should be able to download the latest set of binaries (including new leechcore.dll) and trigger the fixed tiny algorithm with:
MemProcFS.exe -device fpga://algo=2
As described in the new documentation in the LeechCore FPGA guide.
I confirm it works :-) Thanks!
a bit off-topic : I know it is 'meant' to be slower - I am currently dumping mem at 7 MB/s - but do you think there is something that could be done to optimize the speed (maybe reach 15-25 MB/s) ? I could have myself a look if you think there is a possibility to make it faster.
it's going to be much slower unfortunately, but you can try to adjust the tmread default value which is currently optimized for the larger algorithm. Lower values = faster, but too low will result in data corruption and invalid TLPs.
MemProcFS.exe -device fpga://algo=2,tmread=100
Hello,
I have noticed a new issue related to the current fixed tiny algorithm that only happens on Windows 10 build 1909 so far :
MemProcFS.exe -device fpga://algo=2
VmmProc: Unable to auto-identify operating system for PROC file system mount.
It works fine with the normal algo :
MemProcFS.exe -device fpga://algo=1
Initialized 64-bit Windows 10.0.18363)
It fails on the "VmmWinInit_FindNtosScan" call. I think It never passes through " if(*(PQWORD)(pb + p + o) == 0x45444F434C4F4F50) { // POOLCODE " in vmmwininit.c .
The issue doesn't appear on older Windows builds (at least not on 1803/1809).
Any idea ?
I published a new release just now with fixes for low memory 64-bit systems. Your issue seems quite similar to that issue. Can you please check it out to see if it's working or not?
I have just tested the new release v3.2-20200316 and it doesn't fix that issue.
MemProcFS_files_and_binaries_v3.2-20200316>MemProcFS.exe -device fpga://algo=1
Initialized 64-bit Windows 10.0.18363
MemProcFS_files_and_binaries_v3.2-20200316>MemProcFS.exe -device fpga://algo=2
VmmProc: Unable to auto-identify operating system for PROC file system mount.
Target system is 16 GB by the way.
I am currently investigating and I am not sure yet but it could be the pb read bytes (line 335 vmmreadex vmmwinit) that are corrupted.
I confirm this is a problem with the read bytes. At the same address with the tiny algorithm a portion of the bytes are null (00...00) when it is POOLCODE_ ... with the normal algorithm. This address is the ntoskrnl.exe base address in that case.
Dirty quick tested to replace :
VmmReadEx(pSystemProcess, vaBase, pb, (DWORD)cbSize, NULL, 0);
with
QWORD chunksSize = 0x1000;
for (QWORD i = 0; i < cbSize; i += chunksSize) {
VmmReadEx(pSystemProcess, vaBase + i, pb + i, chunksSize, NULL, 0);
}
to be sure it is a problem with the underneath read algorithm and it indeed "fixes" the issue. Any idea of which portion of the tiny algorithm the issue could come from ?
note. I have also tested with different chunks size values and it works with 0x1000, 0x2000, 0x10000, 0x20000, 0x40000, 0x80000. It might be a problem in the algorithm alternance part ?
I don't know fully; it may be that your target machine doesn't support the full tag field in PCIe; or it may be some different problem. Can you please try to change the line below (and the following lines) into my other code below. Note this will effectively cut speed in half as well. Also I'll try to look into this more for the next release of the LeechCore library, but to me when I looked at this some time ago it worked; but I'll look into it more...
can you please change this line (and following lines):
into
if(ctx->fAlgorithmReadTiny) {
i++;
break;
} ...
Replacing
if(i % 2) {
i++;
break;
}
bTag = (ctx->RxEccBit ? 0x80 : 0) + 0x40 + 0x20;
by i++; break;
fixes the issue. Does that mean my machine does not support the full tag pcie field ?
Thanks. It may mean that; it may also be some other error in my algorithm. I guess I have to look into this more; and also to change the algorithm around to try to up the speed; but it's not so widely used so it's not top priority unfortunately.
Alright, if you try something new in any future I will be glad to test it, thanks for your answers! :-)
Just an extra note, in case it helps one day : If I initialize wmmdll with the i++; break; fix, stop the instance and relaunch an instance without the fix (so i % 2 ... ) the new instance initializes well ; might be cause the tag is shifted indeed and there is no read corruption on the desired chunk. I will try to limit the tag value 'loop' to the maximum value of a non-full tag if it makes sense.
Just don't put too much time into it; I'll see if I can redo the tiny algorithm into a more efficient one as well.
I had a new look at this issue, compared the read bytes of "i % 2" with "i++" and the data is very similar. Indeed the only real different part is the needed MZ / Poolcode memory chunk, address 0x200000 -> 0x200640 ; Filled with the good info with "i++" and filled with 0's in the "i % 2" case.
The rest of the 14 Mb of data doesn't show any noticable difference, only a few different bytes here and there.
Also another found hint, with i % 2, the VmmReadEx(pSystemProcess, vaBase, pb, (DWORD)cbSize, NULL, 0);
successfull read the MZ Header part with cbSize = 0x217000 (and lower values). With greater cbSize values - actual default cbSize is 0xe00000- we meet this 0's part issue.
I also made a quick test with an application that creates an array of Bytes filled with 0x1's, tested with 2/20/200MB ; The read result seems good, I didn't notice any corrupted data.
I have now changed the tiny algorithm completely. It should now be auto-detected if required to read. It should also have better performance and not be reading the wrong data.
Since this issue should now be fixed I'm closing the issue. If you should still experience any issues please let me know and re-open the issue.
Hello, I am facing an issue with the read tiny algorithm. If i toggle it in the leechcore lib, pcileech can probe and display memory, but in the MemProcFS lib it fails on VmmWinInit_FindNtosScan / VmmWinInit_FindSystemEPROCESS. If i toggle it back to false it works flawessly. Any idea of what could go wrong ?