virtio-win / kvm-guest-drivers-windows

Windows paravirtualized drivers for QEMU\KVM
https://www.linux-kvm.org/page/WindowsGuestDrivers
BSD 3-Clause "New" or "Revised" License
2.05k stars 387 forks source link

[virtio-fs] Suspected memory leak #1004

Closed SimonFair closed 8 months ago

SimonFair commented 11 months ago

Describe the bug This is a user report received on the Unraid Forums.

To Reproduce Running VM with virtiofs mappings

Expected behavior No Memory leakage.

Screenshots image image

Host:

VM:

Additional context Mmdi is found in pooltag.txt so you actually have to use xperf and wpa for further debug. Following the method described there, I captured a snapshot of memory growth, opened it in wpa, loaded symbols, and expanded the Mmdi pool to find stack references to winfsp-x64.dll and virtiofs.exe. So there's the smoking gun, one of these drivers is the culprit.

I upgraded to the latest versions of WinFSP (2.0) and Virtio-win guest tools (0.1.240) and the leak is still active.

willdrew commented 10 months ago

I'm going to let Backblaze chew through 9TB of data over the long weekend and shoot for 48+ hours of uptime but I think this was it!

Cool, I agree as things are looking quite good on my end as well!

I just bumped my threads to 100 as well, and will let it run for a few hours. I was already maxing out the bandwidth on the lower threads, but more threads should still push things a bit harder when it comes to memory usage; will report back.

It ran fine all day! Just downgraded the VM to 16GB and will let it run over the long weekend; report back next week.

mackid1993 commented 10 months ago

11 hours of uptime. Backblaze is slowing my 20 core VM with 32 GB of RAM to a crawl it's working so hard (100 threads) and my non paged pool is sitting right here like it was.

image
SimonFair commented 10 months ago

@YanVugenfirer When do you think a new release will be created containing this update?

mackid1993 commented 10 months ago

Memory usage is still in a comfortable place with 22 hours of uptime.

image
kostyanf14 commented 10 months ago

@YanVugenfirer When do you think a new release will be created containing this update?

We plan to have a release in a week or two.

mackid1993 commented 10 months ago

@YanVugenfirer When do you think a new release will be created containing this update?

We plan to have a release in a week or two.

That's great. I'm at 29 hours of uptime with a ton of activity on my 8 virtiofs mounts with no issues. I think you fixed it.

YanVugenfirer commented 10 months ago

@vrozenfe when are you planning to release upstream version?

YanVugenfirer commented 10 months ago

The issue was closed automatically due to merger. Re-opening to keep the discussion open until driver release.

mackid1993 commented 10 months ago

@YanVugenfirer when the upstream version is released will we get a full Virtio driver update or just a signed version of virtofs.sys inf exe etc cetera? Pardon my ignorance.

christophocles commented 10 months ago

I started testing the new driver last night, and 14 hours later my RAM usage is stable. This VM has 4 VioFS mounts backed by 4 zpools, as well as a passthrough GPU and USB3 card. All seems to be playing nicely together. Previously, it could barely make it through the file scanning process before OOM and system crash. This is a massive improvement. Excellent work, gents!

viofs leak is fixed

YanVugenfirer commented 10 months ago

@YanVugenfirer when the upstream version is released will we get a full Virtio driver update or just a signed version of virtofs.sys inf exe etc cetera? Pardon my ignorance.

In the update we should have the installer with everything. But keep in mind it is signed by Red Hat, not WHQL singature.

mackid1993 commented 10 months ago

@YanVugenfirer when the upstream version is released will we get a full Virtio driver update or just a signed version of virtofs.sys inf exe etc cetera? Pardon my ignorance.

In the update we should have the installer with everything. But keep in mind it is signed by Red Hat, not WHQL singature.

Does this mean I still have to enable driver test mode?

christophocles commented 10 months ago

Does this mean I still have to enable driver test mode?

Previous releases were signed by Red Hat and didn't require test mode. This test version is completely unsigned which is why the extra steps are needed. I'd expect the update to be packaged exactly like the previous releases.

YanVugenfirer commented 10 months ago

Sorry. My mistake. It will be MS attestation signature. No need for test mode on Windows 10 and up.

mackid1993 commented 10 months ago

Sorry. My mistake. It will be MS attestation signature. No need for test mode on Windows 10 and up.

Awesome, hopefully it will release soon. I've uploaded 7 TB to backblaze with no issue.

willdrew commented 10 months ago

Thanks @mackid1993, @christophocles, all for your testing! Agreed, this is a massive improvement!! :tada:

My final update, TL;DR same as everyone, it's working great! --

I will lower this back to 16GB and test this further later, on another day (maybe tomorrow).

Since late Friday night, I lowered my Windows VM to 16GB and let Backblaze go at it using 100 threads (uploaded about 1TB, due to 40Mbps upload) and it's still running great with no issues (Paged pool 339MB and Non-paged pool 192MB) as of this Tuesday evening!

Thanks again @YanVugenfirer for your hard work on this and looking forward to the official release!

xantari commented 10 months ago

This is awesome, sorry if this has been mentioned in previous threads (this has a long comment history), when will the updated windows ISO's be released?

SilverBut commented 9 months ago

If you are running on a qemu/libvirt virtual machine, and don't want to test-sign new driver, you may want this as a temp mitigation. This is simply a hot applied version of f604edf, by modifying the guest's memory directly from host. Seems okay on my machine after ~1 hour testing.

SimonFair commented 9 months ago

@YanVugenfirer When do you think a new release will be created containing this update?

We plan to have a release in a week or two.

Any updates on a release?

xantari commented 9 months ago

If you are running on a qemu/libvirt virtual machine, and don't want to test-sign new driver, you may want this as a temp mitigation. This is simply a hot applied version of f604edf, by modifying the guest's memory directly from host. Seems okay on my machine after ~1 hour testing.

How do you apply this in unraid for instance?

mackid1993 commented 9 months ago

How do you apply this in unraid for instance?

It's probably easier to use the test driver. See my post on the Unraid forum.

mackid1993 commented 8 months ago

@YanVugenfirer Any idea when we'll see a new driver release with this fix? It's been about a month with the test driver and everything is working great.

SimonFair commented 8 months ago

@YanVugenfirer Any idea when we'll see a new driver release with this fix? It's been about a month with the test driver and everything is working great.

m248 was release a few days ago.

image

mackid1993 commented 8 months ago

@SimonFair Thanks, just updated.