Open JbMachtMuschel opened 3 months ago
Command I run: Start-DscConfiguration -Path C:\DscCert\20240822_170552\EXOActiveSyncDeviceAccessRule20240822_170552M365TenantConfig -Verbose -Force -Wait #serveral ThrottleLimts
I'm still having this issue. It's only happening with EXO resources but not the same one consistently. The only consistency seems to be that the WMI Provider Host process hits about 4100 MB and then it crashes. If I run 1.24.403.1 the WMI Provider Host doesn't use nearly as much memory, and it appears to return memory to the system throughout the test. I've tried pwsh 7 as well as three other machines (Win11 and 2022) and it doesn't help.
@jadamones Quick question: When you check the memory consumption in Task Manager, is the WMI Provider Host process running in 32-bit? If yes, that would explain why it fails, since 32-bit applications have a memory limit of 4GB. Unfortunately, I'm not aware of any way how to switch to 64-bit for the WMI process, which would mitigate that issue.
I check my system and WmiPrvSE.exe is a 64bit application. After consuming a bit more than 4 GB memory, the process changes the status to "suspended" and afterwards the import fails with the mentioned error. Any hints? This test is the beginning of our DSC eval and I am importing a low number of policies, so when doing imports with more than 10000 items, it will most likely fail.
Interessting point is also, that after crashing of WmiPrvSE.exe the process is starting again, even after the powershell command reported the error: The WS-Management service cannot process the request. The WMI service or the WMI provider returned an unknown error: HRESULT 0x80041033
Hey @FabienTschanz I wondered that as well and should've mentioned that I already confirmed that. It is indeed the 64-bit provider.
Looking at what changed between 1.24.403.1 and the next version where this problem started for me is documented below for the EXO workload. Nothing is jumping out at me but I'm not sure what went into the Misc changes. Interestingly, @JbMachtMuschel - EXOActiveSyncDeviceAccessRules were introduced. I don't have those rules defined in any of the tenants where this is a problem, and I don't monitor or define the EXOMailboxSettings resource in any of my configurations, so in my case, it doesn't appear to be an issue with either of those resources.
EXOActiveSyncDeviceAccessRule
EXOMailboxSettings
MISC
Telemetry
@jadamones In that view, only the application name is visible, but not if it's 32-Bit or not. That is only visible in the "Processes" view of the Task Manager, when you scroll down to "Background Processes" --> WMI Provider Host (32 bit). If you go to details for a 32-bit process, you'll end up with the same view as you wrote.
Unfortunately I don't have an Exchange infrastructure in my test lab (and I have no idea how to manage that 😅), so I'm not of much help even if I have your configuration.
Still I will take a look at what changed in the code just as you did, maybe we can find something.
Also @JbMachtMuschel I do not experience the loop. The WMI Provider just crashes and does not restart.
Soon I will restart.. @jadamones: If you like, simply create some. 1..400 |%{$OS = "iOS" + $_; write-host "Creating $OS";New-ActiveSyncDeviceAccessRule -QueryString $OS -Characteristic UserAgent -AccessLevel Quarantine -confirm:$false}
QueryString : iOS1 Characteristic : UserAgent AccessLevel : Quarantine Name : iOS1 (UserAgent) AdminDisplayName : ExchangeVersion : 0.10 (14.0.100.0) DistinguishedName : CN=iOS1 (UserAgent),CN=Mobile Mailbox Settings,CN=Configuration,CN=bdfgrptest.onmicrosoft.com, CN=ConfigurationUnits,DC=DEUP281A012,DC=PROD,DC=OUTLOOK,DC=COM Identity : iOS1 (UserAgent) ObjectCategory : DEUP281A012.PROD.OUTLOOK.COM/Configuration/Schema/ms-Exch-Device-Access-Rule ObjectClass : {top, msExchDeviceAccessRule} WhenChanged : 8/27/2024 5:50:58 PM WhenCreated : 8/27/2024 5:50:58 PM WhenChangedUTC : 8/27/2024 3:50:58 PM WhenCreatedUTC : 8/27/2024 3:50:58 PM ExchangeObjectId : a6617681-e938-4296-81bc-32b8d0671cbf OrganizationalUnitRoot : bdfgrptest.onmicrosoft.com OrganizationId : DEUP281A012.PROD.OUTLOOK.COM/Microsoft Exchange Hosted Organizations/bdfgrptest.onmicrosoft.com - DEUP281A012.PROD.OUTLOOK.COM/ConfigurationUnits/bdfgrptest.onmicrosoft.com/Configuration Id : iOS1 (UserAgent) Guid : a6617681-e938-4296-81bc-32b8d0671cbf OriginatingServer : FR3P281A12DC003.DEUP281A012.PROD.OUTLOOK.COM IsValid : True ObjectState : Unchanged
Hey @FabienTschanz thank you for helping out with this and looking into it! The 7th column over in that screenshot shows the system architecture as x64 for the process but regardless, I do not have the (32 bit) tag in the name under the processes view for any of them or the one that's running up the memory during a Test-DscConfiguration. Can also confirm that the file path is System32 and not SysWow64 for that particular process.
@JbMachtMuschel Thanks, I will create some when I have a chance just to see if I get the loop.
@jadamones Thank you. The loop is one problem, but the memory consumption of WmiPrvSE.exe seems to be the causing problem. I checked https://learn.microsoft.com/en-us/troubleshoot/windows-server/system-management-components/scenario-guide-troubleshoot-wmiprvse-quota-exceeded-issues I configured this from the article: Client application performs abnormal, inefficient, or large queries But I do not know if this is happens: The WmiPrvse.exe process doesn't release resources as expected.
@jadamones Of course, silly me. Sometimes I'm like a blind chicken. Aaanyways, under the assumption that the WMI process has a memory limit of 4GB, we could try to increase the quota per process and overall like the following in an elevated Windows PowerShell session (according to https://techcommunity.microsoft.com/t5/ask-the-performance-team/wmi-high-memory-usage-by-wmi-service-or-wmiprvse-exe/ba-p/375491):
$quotaConfiguration = Get-WmiObject -Class __ProviderHostQuotaConfiguration -Namespace Root
$quotaConfiguration.MemoryAllHosts = 4 * 4GB
$quotaConfiguration.MemoryPerHost = 3 * 4GB
Set-WmiInstance -InputObject $quotaConfiguration
I don't know if a restart is required, but just to be sure, I would recommend one.
Another approach we can take is to bump up the ExchangeOnlineManagement
version to 3.5.1. For this to work, you can update the file C:\Program Files\WindowsPowerShell\Modules\Microsoft365DSC\1.24.731.1\Dependencies\Manifest.psd1
so that the version for it is now 3.5.1 instead of 3.4.0. After that, run Update-M365DSCDependencies
to install the updated module (or Install-Module ExchangeOnlineManagement -RequiredVersion 3.5.1 -Force
for direct install).
@jadamones Thank you...the quotaConfiguration seems to be an approch, but I guess that this will reach the 16GB limit. I will inform you. Currently I am inporting the Rules and approx. 110 are already imported, but the memory consumption is very high:
The EXO management module is a known memory hog ever since they moved to using REST API and it seems to only get worse and worse on each release, you can search in the interwebs that several people complain about this. Heck, there's even an official article [0] on how to reduce its memory usage which by the way doesn't solve the problem since it also leaks the huge amounts of memory it allocates and is never freed!!! Additionally to this the module is also prone to spitting random 0x800... errors so to workaround this problem I always loop the deployment until a max of 3 attempts if it fails.
I have this setup in DevOps pipelines which run on discardable containers so they are memory constrained and always fail if no changes are done, but with the code below, which is a variation of what @FabienTschanz posted before, I can get it going along with the rest of all the workloads I use. Please bear in mind that this allows to allocate all available memory of the machine to the WMI process (mem is not allocated all at once, only when required) so you may need to do some math to calculate better values for your requirements.
#region Increase Memory Quota for WMI processes
try {
$ComputerSystem = Get-CimInstance -ClassName "Win32_ComputerSystem"
}
catch {
throw $_
}
if (![String]::IsNullOrEmpty($ComputerSystem)) {
$TotalPhysicalMemory = $ComputerSystem.TotalPhysicalMemory
if (![String]::IsNullOrEmpty($TotalPhysicalMemory)) {
try {
$Quota = Get-CimInstance -Namespace "Root" `
-Class "__ProviderHostQuotaConfiguration"
}
catch {
throw $_
}
if (![String]::IsNullOrEmpty($Quota)) {
if ($Quota.MemoryAllHosts -ne $TotalPhysicalMemory) {
$Quota.MemoryAllHosts = $TotalPhysicalMemory
$Quota.MemoryPerHost = $TotalPhysicalMemory
$Quota.HandlesPerHost = 8192
$Quota.ThreadsPerHost = 512
Write-Output "Increasing WMI processes memory quota"
try {
Set-CimInstance -InputObject $Quota
}
catch {
throw $_
}
$WMIProcesses = Get-Process -Name "WMIPrvSE" `
-ErrorAction "SilentlyContinue"
if ($WMIProcesses.Count -ne 0) {
Write-Output "Restarting WMI processes"
foreach ($WMIProcess in $WMIProcesses) {
try {
Stop-Process -Id $WMIProcess.Id -Force
}
catch {
throw $_
}
}
}
}
}
}
}
#endregion Increase Memory Quota for WMI processes
@ricmestre Thank you for the information. Just now the process mentioned in the screenshot above, released memory as it reached 16GB and it is running and importing still:
I was able to import the ActiveSyncDeviceAccessRules now :) I did a retry to observe how DSC is working on onready created rules and after a while the command throw this error: VERBOSE: [HAMS010288]: LCM: [ End Set ] The SendConfigurationApply function did not succeed.
VERBOSE: Operation 'Invoke CimMethod' complete. VERBOSE: Time taken for configuration job to complete is 4292.016 seconds
Interessting is, that the WMI process "never gives up", even after deleting all ActiveSyncDeviceAccessRules it starts again to create them - the powershell is not involved, I terminated the shell. So I killed the WMI process now.
Thank you all for all the effort on this. Unfortunately, these haven't resolved my issue :( . I updated the host quota as suggested, and I attempted an upgrade of the Exchange Online module (maybe I'm missing something here). After upgrading the Exchange module to 3.5.1, I get this error MSFT_EXOOrganizationConfig failed to execute Test-TargetResource functionality with error message: IDX12729: Unable to decode the header '[PII of type 'System.String' is hidden.
FWIW I'm running into this issue using the checkdsccompliancy.ps1 script in a DevOps pipeline.
Here are my WMI quotas (system has 32GB)
@jadamones I have had the same error with the 3.5.1 module and certificate based login. Just now I am using 3.4.0.
Oh SMH... 🤦♂️ I realized that I didn't reboot the machine after updating the quotas. That seems to have done the trick! Interesting that the documentation just says to restart the WMI service, but definitely seems to be working after a reboot now. Thank you all for your input here. Much appreciated!
@ricmestre Do we want to document this as a workaround somewhere? Just for the sake that it is documented and that there is something we can actually do to circumvent the issue?
@FabienTschanz I wouldn't mind if you add something for example here [0] which looks scarcely empty and both you and I know that there are other issues out there that are not documented how to fix it or at least workaround them, but for this particular case I'd really like to have some kind of improvement in the module itself.
@NikCharlebois @ykuijs @andikrueger @desmay @malauter Hi, is this something that any of you can take to the EXO team in order to solve it or at least improve the experience? Running a cmdlet here and there is one thing, another one is trying to import the whole workload to other tenants using M365DSC, or even just trying to deploy a single resource but that contains hundreds or even thousands of children which will only exacerbate this known memory issue.
[0] https://microsoft365dsc.com/user-guide/get-started/troubleshooting/
@ricmestre I will add an entry in the troubleshooting section for this issue and raise a PR.
Shall I open another issue regarding? Interessting is, that the WMI process "never gives up", even after deleting all ActiveSyncDeviceAccessRules it starts again to create them - the powershell is not involved, I terminated the shell. So I killed the WMI process now. To clarify: The import of the more than 400 is done, the command powershell is reflecting this, but WmiPrvSE process dosen't care. It seems, it never stops without interfering.
That's how the LCM works, check https://learn.microsoft.com/en-us/powershell/dsc/managing-nodes/metaconfig?view=dsc-1.1 and settings ConfigurationMode and ConfigurationModeFrequencyMins
Hi again, I did several tests with different ExchangeOnlinemanagement module versions and only 3.4.0 is working - 3.5.0 and 3.5.1 is throwing "Unable to decode the header '[PII of type 'System.String' is hidden". Powershell 7 makes no difference.
Unfortunately the error is back. It seems that WMI process does NOT release memory like in the image below, it crashes again. Then I restarted the process and once it released memory, but a while afterwards, it crashed again.
@ricmestre Thanks for the info, I was not aware of this.
@FabienTschanz Hi Fabien, I noticed that Exchange retentionpolicies and retentiontags are not supported by DSC or I did not find them in my exports. Do you have an idea how to address this?
@JbMachtMuschel For resources not available in DSC, you can open a new issue (new resource proposal).
What's the next course of action here? Do we want to close this issue or try to raise attention by the Exchange team?
Of course it would be a good idea to raise attention by the Exchange team. We decided now not to use DSC, since it for our usecase to reliable enough. Nevertheless we will run exports - better to have them then to need them ;) We simply go on with Powershell exporting/importing. Thanks for you help.
@NikCharlebois Can you provide us with a contact to the Exchange Online team or raise an issue there? We can't seem to get in touch with them, and this issue is a huge pain point which should be addressed in the Exchange PowerShell module itself, not with some workaround from DSC. Thank you.
Completely agree that this issue needs solving at the ExchangeOnline module end. In the mean time another fix is to try to implement Connect-ExchangeOnline -CommandName option in this module and MSCloudLoginAssistant per #3956 - although as @nikcharlebois says, that's probably a chunk of work...
I just created the PR https://github.com/microsoft/MSCloudLoginAssistant/pull/181 that introduces the proposed changes with the smarter command import logic. I sure hope that doesn't break the stuff, although it didn't break it during my testing...
Guys, I am working with a customer that also has this issue. We are trying some of the suggestions that were listed above.
I am also trying to find a contact at the Exchange PG, so we can have this issue investigated and hopefully resolved. Would be great if you could provide me with some input:
@ykuijs The issue is more visible when the blueprints are much bigger and consume more memory, I've seen it fail running my integration pipeline when deploying multiple workloads for Create/Update/Remove, and on which I then call Test-DscConfiguration afterwards, for this problem I increased the used memory on the WMI processes and ensured that both the DSC core process and the WMI processes are restarted after each deployment and then again after each call to Test-DscConfiguration per each workload so basically involves a lot of restarts but I didn't see more memory issues after changing the code to proceed this way.
Unfortunately the EXO module is still very prone to random WMI failures, such as the one reported in the title of this issue, even with small blueprints so in my case I've added a retry system of 3 times until it succeeds otherwise it aborts the deployment, don't have any other better way to workaround this since there's no way to predict how and/or when it will fail.
The problem is definitely with the EXO module itself, these problems don't occur in DSC with any other workload.
@ykuijs It's already well described in the Microsoft365DSC troubleshooting guide, at least for me https://microsoft365dsc.com/user-guide/get-started/troubleshooting/#error-the-wmi-service-or-the-wmi-provider-returned-an-unknown-error-hresult-0x80041033-when-running-exchange-workload
@ricmestre, @Borgquite, thanks for this info!
For sure fixing the issue in the module is the best way to resolve these issues, that is why I am planning to contact the PG. In the meantime, I have found a contact, I will reach out later today. I just wanted to make sure I have a clear understanding of the issue and ideally, how to reproduce the issue quickly and easily (the bigger the chances are they will spend time on fixing it).
A strange thing is that my customer, is experiencing this issue when applying tenant settings to SPO:
[[SPOTenantSettings]TenantSettingsDefaults::[SharePoint]SharePoint_Configuration] Connected to the SharePoint Online
Admin Center at 'https://tenantname-admin.sharepoint.com'/ to run this cmdlet
##[error]The WS-Management service cannot process the request. The WMI service or the WMI provider returned an unknown error:
HRESULT 0x80041033
+ CategoryInfo : ResourceUnavailable: (root/Microsoft/...gurationManager:String) [], CimException
+ FullyQualifiedErrorId : HRESULT 0x80041033
+ PSComputerName : localhost
VERBOSE: Operation 'Invoke CimMethod' complete.
One way to reduce memory consumption (for as long as the module is bad at handling the memory itself) would be to only load the required commands using the new functionality from https://github.com/microsoft/MSCloudLoginAssistant/pull/181. But that probably won‘t work if we‘re using 30 different EXO resources in the same deployment run because it‘ll load still load many commands. The functionality is here, but not yet used.
@ykuijs That's weird but it looks like someone else also reported something similar on #4183, although no solution came out of it.
Then there's also #4632 where it's about EXO again, where it was reported that going back on older versions of DSC solved the issue. Here I suspect that that solved the issue because the older DSC versions used old EXO module versions which didn't consume as much memory and didn't have as many problems as the newer ones.
A strange thing is that my customer, is experiencing this issue when applying tenant settings to SPO:
@ykuijs If the issue is that you're hitting the WMI memory limit, then it could in theory break in any module. If you are say, running many many EXO resources (pushing the memory usage higher and higher) - but the SPO resource happens to be the one that pushes you over 4GB, I'd expect that. Logging in to SPO may be the 'straw that breaks the camel's back' - but it's EXO that is normally responsible for the majority of the memory usage.
For any given configuration, the failure normally seems to take place at the same time - but the underlying issue is that the WMI host runs out of memory - and it's the EXO module that's normally chewed up the lion's share of it.
According to the memory optimization article, the issue is caused by the fact that the module doesn't handle Connecting and Disconnecting properly. If that is the case, I am wondering why are we constantly connecting and disconnecting to EXO? Can't we simply reuse the already created connection?
@ykuijs I agree - trying to reduce the need to connect and disconnect is good. However I've observed:
So it seems that there are two difficult problems here which only Microsoft can really resolve:
The situation with Microsoft PowerShell modules is frankly remininscent of the bad old Windows 3.1 days of 'DLL Hell' - lack of coordination between teams, or a refusal to work on Windows PowerShell 5.1 because 'we should all be using PowerShell Core' - with a lack of understanding of why some projects (including this one) still have a dependency on Windows PowerShell 5.1 instead of PowerShell Core. In my view, they got too excited about breaking changes, which resulted in too many removed features & too many breaking changes in PowerShell Core removing any incentive to move - failing to learn the lessons of e.g. Perl 6. It's an absolute mess in my opinion.
Are there any things we should be checking out and exploring inside of the community, e.g. trying a version with lesser context switches, maybe auto-arranging resources based on their type / workload, reduce dependencies to Windows PowerShell 5.1 etc? How about a combined issue with all the steps that are worth exploring and tracking them in dedicated issues?
I'd be glad to help with the implementation since I'm very familiar with the internas of Microsoft365DSC and the underlying technology. What do you think? @ykuijs @Borgquite @ricmestre @NikCharlebois
Description of the issue
I am trying to import more than 400 ActiveSyncDeviceAccessRules into another tenant, but after approx. 90 rules this error appears: The WS-Management service cannot process the request. The WMI service or the WMI provider returned an unknown error: HRESULT 0x80041033
Without setting this: Set-WSManInstance -ResourceURI winrm/config -ValueSet @{MaxEnvelopeSizekb = "1048576"} an import stops almost immediately.
Microsoft 365 DSC Version
1.24.731.1
Which workloads are affected
Exchange Online
The DSC configuration
Verbose logs showing the problem
Environment Information + PowerShell Version