jonomango / hv

Lightweight Intel VT-x Hypervisor.
MIT License
387 stars 78 forks source link

Timing detection #12

Open clown444 opened 1 year ago

clown444 commented 1 year ago

When I ran pafish while my physical computer was virtualized(manually mapped the hv), I realized there was a timing detection via rdtsc + cpuid + rdtsc. I ran my own tests and it seems like tsc offseting doesn't work as well as intended. I am on Windows 10 22h2, i7-6700k cpu. Screenshots below show the overhead measurement by the driver, and on the console window each line represents the average time between 10 rdtsc + cpuid + rdtsc measurements. Average times before the red line is my system not virtualized and after the red line the system is virtualized.

Screenshot_1 Screenshot_2

jonomango commented 1 year ago

I've actually noticed this as well in my own tests recently. I'm not sure what I've changed, but it's failing to pass pafish and my own detection suite most of the time. I'll look into it.

jonomango commented 1 year ago

BTW, was this test done with the latest commit? It would probably be a good idea to remove the logging since that adds some overhead.

clown444 commented 1 year ago

It was done with the latest commit since it fixed all of my bsod issues. I'm using overhead + constant right now. I will remove the logging as well and will share if I find anything related to the cause of the issue, thanks for the reply.

jonomango commented 1 year ago

This is super weird... I'm testing using the following code and getting strange results:

  int info[4];

  while (true) {
    auto const start = __rdtsc();
    __cpuidex(info, 0, 0);
    auto const end = __rdtsc();

    printf("%i\n", end - start);
    Sleep(500);
  }

For some reason, the first print is ALWAYS around ~300tsc (which is what it should be) but then everything AFTER that hovers around 4-5k. Not sure why this is happening...

clown444 commented 1 year ago

My test code is basically the same with yours except I do print(average(10)) sleep(500), the results I'm getting with logging removed in vmexit handler is weird, average timings fluctuate a lot as shown in the screenshot(ranging from ~2k to even ~32k) Screenshot_3

jonomango commented 1 year ago

Hi @clown444, please check out this commit: https://github.com/jonomango/hv/commit/a4059677b225ff72cdb2c0a7d2cc68d8d9475106. I've modified the timing evasion algorithm and it (mostly) evades pafish on my machine. Unfortunately, it is still very inconsistent, but it'll do for the time being.

AIVDNL commented 1 year ago

If you are curious, my semi-consistent pafish output is only: [] Checking the difference between CPU timestamp counters (rdtsc) ... traced! [] Checking the difference between CPU timestamp counters (rdtsc) forcing VM exit ... traced!

they occur roughly 50 50

jonomango commented 1 year ago

@AIVDNL Thank you. I'm working on a new solution which I think should fix all the issues, but I'm having a bit of trouble with the actual implementation details... The big issue is that context switches are way more likely to occur when virtualized vs on bare-metal, simply because it takes more time to execute. I'm experimenting with a per-process timer structure, as mentioned by Daax in some unknowncheats post, which should hopefully fix all these issues.