microsoft / Windows-Dev-Performance

A repo for developers on Windows to file issues that impede their productivity, efficiency, and efficacy
MIT License
439 stars 21 forks source link

winmgmt.exe /verifyrepository is O(n^2) while holding a WMI lock #67

Closed randomascii closed 3 years ago

randomascii commented 3 years ago
Item Value
OS, Version / Build Win32NT 10.0.18363.0 Microsoft Windows NT 10.0.18363.0
Processor Architecture AMD64
Processor Type & Model Intel Xeon CPU E5-2690 v3 (2 processors)
Memory 96 GB
Storage Type, free / capacity (e.g. C: SSD 128GB / 512GB) 1 TB SSD, 430 GB free
Relevant apps installed Visual Studio, Chrome, full Chromium repo, Edge Canary, Edge Dev, ???

Description

The c:\windows\System32\wbem\Repository directory can grow unbounded if this command is run repeatedly: gpresult /scope:computer /x results.txt /F

That in itself is bad (the growth seems to be unbounded) but it becomes a critical issue because "winmgmt.exe /verifyrepository" has an O(n^2) loop so that while the directory grows linearly the verifyrepository time grows quadratically. By the time I noticed this I was hitting multi-minute hangs because a WMI lock was held the whole time which prevented some processes from making forward progress.

Steps to reproduce

Run "gpresult /scope:computer /x results.txt /F" - in our case our IT department was doing this. Run "winmgmt.exe /verifyrepository" occasionally - in our case our IT department was doing this hourly.

Expected behavior

The Wbem repository should stay roughly a constant size (not wasting disk space) and the /verifyrepository time should stay roughly constant and extremely fast.

Actual behavior

Starting of ninja builds and some other operations would be paused for multiple minutes with the delays increasing by the day and with no upper bound.

For more details see: https://randomascii.wordpress.com/2019/12/08/on2-again-now-in-wmi/

ericsampson commented 3 years ago

@bitcrazed would it be possible to get someone to triage this? Thanks!

bitcrazed commented 3 years ago

Sorry - I updated the label to "investigating" but forgot to update the thread with a comment.

I have a conversation ongoing with the owners of WinMgmt and will reply as soon as I can.

bitcrazed commented 3 years ago

Thanks for reporting @randomascii. I've reported this to the team. They have stated that the WMI repository is not designed to service data > 200MB, as per this TechNet article: https://social.technet.microsoft.com/wiki/contents/articles/10130.root-causes-for-slow-boots-and-slow-logons-aka-sbsl.aspx#WMI

Closing this issue since this is a) by design, and b) has a work-around (i.e. not calling VerifyRepository every hour).

Thanks for reporting though - the team have created a bug and will track in case of increased number of incidents of this issue.

randomascii commented 3 years ago

That would be great if there was some way to keep WMI databases small. In our case our database was growing without bounds, apparently due to running this command:

gpresult /scope:computer /x results.txt /F

If there was a way to delete the unneeded (possibly unreferenced?) records in order to shrink the database size then we would be fine. I suspect that garbage collection is all that is needed, although I don't know that for sure.

It appears that the WMI database does get cleaned up eventually because both of my machines now have small c:\windows\System32\wbem\Repository directories. However it does seem problematic that running Microsoft's gpresult command on a regular basis will cause the WMI database to exceed Microsoft's recommended size limitations...

randomascii commented 3 years ago

Thinking about this more...

Saying "WMI repository is not designed to service data > 200MB" without also giving advice on how to keep the WMI repository small is just victim blaming. We used Microsoft tools (gpresult) which caused the WMI database to grow beyond Microsoft's recommendations, which then caused "winmgmt.exe /verifyrepository" to exhibit quadratic performance behavior. Your advice seems to be "either don't use gpresult, or don't use /verifyrepository", so, I guess, Linux?

bitcrazed commented 3 years ago

Hey Bruce. Some form of garbage collection after gpresult executes would likely resolve this issue. And as you point out, it appears that some garbage collection occurs periodically, though perhaps not often enough to collect the data left behind by running gpresult hourly.

We've reported the issue to the WMI team who have the bug on their backlog and will revisit as priorities, resources, incidence frequencies, impact, etc. allow.

ericsampson commented 3 years ago

A sufficiently determined person could probably do a little reverse engineering and then write a utility program that would manually kick off the GC function, as a workaround…