microsoft / Microsoft365DSC

Manages, configures, extracts and monitors Microsoft 365 tenant configurations
https://aka.ms/M365DSC
MIT License
1.63k stars 502 forks source link

Error The WMI service or the WMI provider returned an unknown error: HRESULT 0x80041033 when importing ActiveSyncDeviceAccessRules #4982

Open JbMachtMuschel opened 3 months ago

JbMachtMuschel commented 3 months ago

Description of the issue

I am trying to import more than 400 ActiveSyncDeviceAccessRules into another tenant, but after approx. 90 rules this error appears: The WS-Management service cannot process the request. The WMI service or the WMI provider returned an unknown error: HRESULT 0x80041033

Without setting this: Set-WSManInstance -ResourceURI winrm/config -ValueSet @{MaxEnvelopeSizekb = "1048576"} an import stops almost immediately.

Microsoft 365 DSC Version

1.24.731.1

Which workloads are affected

Exchange Online

The DSC configuration

I exported EXOClientAccessRule in the prod tenant using
Authentication methods specified:
- Service Principal with Certificate Thumbprint

On the destination tenant I cerated the mof file and adapted the ConfigurationData.psd1. I tried several machines. 

PS Version:
Name             : ConsoleHost
Version          : 5.1.17763.6189
InstanceId       : aa499f6c-4b52-401a-a18d-78ce4d475acc
UI               : System.Management.Automation.Internal.Host.InternalHostUserInterface
CurrentCulture   : de-DE
CurrentUICulture : en-US
PrivateData      : Microsoft.PowerShell.ConsoleHost+ConsoleColorProxy
DebuggerEnabled  : True
IsRunspacePushed : False
Runspace         : System.Management.Automation.Runspaces.LocalRunspace

Verbose logs showing the problem

VERBOSE: [ServerName,]:                            [[EXOActiveSyncDeviceAccessRule]EXOActiveSyncDeviceAccessRule-iOS 8.3 12F70 (DeviceOS)] Test-TargetResource returned False
VERBOSE: [ServerName,]: LCM:  [ End    Test     ]  [[EXOActiveSyncDeviceAccessRule]EXOActiveSyncDeviceAccessRule-iOS 8.3 12F70 (DeviceOS)]  in 5.5740 seconds.
VERBOSE: [ServerName,]: LCM:  [ Start  Set      ]  [[EXOActiveSyncDeviceAccessRule]EXOActiveSyncDeviceAccessRule-iOS 8.3 12F70 (DeviceOS)]
VERBOSE: [ServerName,]:                            [[EXOActiveSyncDeviceAccessRule]EXOActiveSyncDeviceAccessRule-iOS 8.3 12F70 (DeviceOS)] Setting Active Sync Device Access Rule configuration for iOS 8.3 12F70 (DeviceOS)
VERBOSE: [ServerName,]:                            [[EXOActiveSyncDeviceAccessRule]EXOActiveSyncDeviceAccessRule-iOS 8.3 12F70 (DeviceOS)] Getting Active Sync Device Access Rule configuration for iOS 8.3 12F70 (DeviceOS)
VERBOSE: [ServerName,]:                            [[EXOActiveSyncDeviceAccessRule]EXOActiveSyncDeviceAccessRule-iOS 8.3 12F70 (DeviceOS)] Trying to retrieve instance by Identity
VERBOSE: [ServerName,]:                            [[EXOActiveSyncDeviceAccessRule]EXOActiveSyncDeviceAccessRule-iOS 8.3 12F70 (DeviceOS)] Active Sync Device Access Rule iOS 8.3 12F70 (DeviceOS) does not exist.
VERBOSE: [ServerName,]:                            [[EXOActiveSyncDeviceAccessRule]EXOActiveSyncDeviceAccessRule-iOS 8.3 12F70 (DeviceOS)] Active Sync Device Access Rule 'iOS 8.3 12F70 (DeviceOS)' does not exist but it
should. Create and configure it.
VERBOSE: [ServerName,]: LCM:  [ End    Set      ]  [[EXOActiveSyncDeviceAccessRule]EXOActiveSyncDeviceAccessRule-iOS 8.3 12F70 (DeviceOS)]  in 9.8730 seconds.
VERBOSE: [ServerName,]: LCM:  [ End    Resource ]  [[EXOActiveSyncDeviceAccessRule]EXOActiveSyncDeviceAccessRule-iOS 8.3 12F70 (DeviceOS)]
VERBOSE: [ServerName,]: LCM:  [ Start  Resource ]  [[EXOActiveSyncDeviceAccessRule]EXOActiveSyncDeviceAccessRule-iOS 8.4 12H143 (DeviceOS)]
VERBOSE: [ServerName,]: LCM:  [ Start  Test     ]  [[EXOActiveSyncDeviceAccessRule]EXOActiveSyncDeviceAccessRule-iOS 8.4 12H143 (DeviceOS)]
VERBOSE: [ServerName,]:                            [[EXOActiveSyncDeviceAccessRule]EXOActiveSyncDeviceAccessRule-iOS 8.4 12H143 (DeviceOS)] Testing Active Sync Device Access Rule configuration for iOS 8.4 12H143 (DeviceOS)
VERBOSE: [ServerName,]:                            [[EXOActiveSyncDeviceAccessRule]EXOActiveSyncDeviceAccessRule-iOS 8.4 12H143 (DeviceOS)] Getting Active Sync Device Access Rule configuration for iOS 8.4 12H143 (DeviceOS)
The WS-Management service cannot process the request. The WMI service or the WMI provider returned an unknown error: HRESULT 0x80041033
    + CategoryInfo          : ResourceUnavailable: (root/Microsoft/...gurationManager:String) [], CimException
    + FullyQualifiedErrorId : HRESULT 0x80041033
    + PSComputerName        : localhost

Environment Information + PowerShell Version

OsName               :
OsOperatingSystemSKU :
OsArchitecture       :
WindowsVersion       : 1809
WindowsBuildLabEx    : 17763.1.amd64fre.rs5_release.180914-1434
OsLanguage           :
OsMuiLanguages       :
JbMachtMuschel commented 3 months ago

Command I run: Start-DscConfiguration -Path C:\DscCert\20240822_170552\EXOActiveSyncDeviceAccessRule20240822_170552M365TenantConfig -Verbose -Force -Wait #serveral ThrottleLimts

jadamones commented 3 months ago

I'm still having this issue. It's only happening with EXO resources but not the same one consistently. The only consistency seems to be that the WMI Provider Host process hits about 4100 MB and then it crashes. If I run 1.24.403.1 the WMI Provider Host doesn't use nearly as much memory, and it appears to return memory to the system throughout the test. I've tried pwsh 7 as well as three other machines (Win11 and 2022) and it doesn't help.

FabienTschanz commented 3 months ago

@jadamones Quick question: When you check the memory consumption in Task Manager, is the WMI Provider Host process running in 32-bit? If yes, that would explain why it fails, since 32-bit applications have a memory limit of 4GB. Unfortunately, I'm not aware of any way how to switch to 64-bit for the WMI process, which would mitigate that issue.

JbMachtMuschel commented 2 months ago

I check my system and WmiPrvSE.exe is a 64bit application. After consuming a bit more than 4 GB memory, the process changes the status to "suspended" and afterwards the import fails with the mentioned error. Any hints? This test is the beginning of our DSC eval and I am importing a low number of policies, so when doing imports with more than 10000 items, it will most likely fail.

JbMachtMuschel commented 2 months ago

Interessting point is also, that after crashing of WmiPrvSE.exe the process is starting again, even after the powershell command reported the error: The WS-Management service cannot process the request. The WMI service or the WMI provider returned an unknown error: HRESULT 0x80041033

jadamones commented 2 months ago

Hey @FabienTschanz I wondered that as well and should've mentioned that I already confirmed that. It is indeed the 64-bit provider.

image
jadamones commented 2 months ago

Looking at what changed between 1.24.403.1 and the next version where this problem started for me is documented below for the EXO workload. Nothing is jumping out at me but I'm not sure what went into the Misc changes. Interestingly, @JbMachtMuschel - EXOActiveSyncDeviceAccessRules were introduced. I don't have those rules defined in any of the tenants where this is a problem, and I don't monitor or define the EXOMailboxSettings resource in any of my configurations, so in my case, it doesn't appear to be an issue with either of those resources.

EXOActiveSyncDeviceAccessRule

EXOMailboxSettings

MISC

Telemetry

FabienTschanz commented 2 months ago

@jadamones In that view, only the application name is visible, but not if it's 32-Bit or not. That is only visible in the "Processes" view of the Task Manager, when you scroll down to "Background Processes" --> WMI Provider Host (32 bit). If you go to details for a 32-bit process, you'll end up with the same view as you wrote. image

Unfortunately I don't have an Exchange infrastructure in my test lab (and I have no idea how to manage that 😅), so I'm not of much help even if I have your configuration.

Still I will take a look at what changed in the code just as you did, maybe we can find something.

jadamones commented 2 months ago

Also @JbMachtMuschel I do not experience the loop. The WMI Provider just crashes and does not restart.

JbMachtMuschel commented 2 months ago

image

Soon I will restart.. @jadamones: If you like, simply create some. 1..400 |%{$OS = "iOS" + $_; write-host "Creating $OS";New-ActiveSyncDeviceAccessRule -QueryString $OS -Characteristic UserAgent -AccessLevel Quarantine -confirm:$false}

QueryString : iOS1 Characteristic : UserAgent AccessLevel : Quarantine Name : iOS1 (UserAgent) AdminDisplayName : ExchangeVersion : 0.10 (14.0.100.0) DistinguishedName : CN=iOS1 (UserAgent),CN=Mobile Mailbox Settings,CN=Configuration,CN=bdfgrptest.onmicrosoft.com, CN=ConfigurationUnits,DC=DEUP281A012,DC=PROD,DC=OUTLOOK,DC=COM Identity : iOS1 (UserAgent) ObjectCategory : DEUP281A012.PROD.OUTLOOK.COM/Configuration/Schema/ms-Exch-Device-Access-Rule ObjectClass : {top, msExchDeviceAccessRule} WhenChanged : 8/27/2024 5:50:58 PM WhenCreated : 8/27/2024 5:50:58 PM WhenChangedUTC : 8/27/2024 3:50:58 PM WhenCreatedUTC : 8/27/2024 3:50:58 PM ExchangeObjectId : a6617681-e938-4296-81bc-32b8d0671cbf OrganizationalUnitRoot : bdfgrptest.onmicrosoft.com OrganizationId : DEUP281A012.PROD.OUTLOOK.COM/Microsoft Exchange Hosted Organizations/bdfgrptest.onmicrosoft.com - DEUP281A012.PROD.OUTLOOK.COM/ConfigurationUnits/bdfgrptest.onmicrosoft.com/Configuration Id : iOS1 (UserAgent) Guid : a6617681-e938-4296-81bc-32b8d0671cbf OriginatingServer : FR3P281A12DC003.DEUP281A012.PROD.OUTLOOK.COM IsValid : True ObjectState : Unchanged

jadamones commented 2 months ago

Hey @FabienTschanz thank you for helping out with this and looking into it! The 7th column over in that screenshot shows the system architecture as x64 for the process but regardless, I do not have the (32 bit) tag in the name under the processes view for any of them or the one that's running up the memory during a Test-DscConfiguration. Can also confirm that the file path is System32 and not SysWow64 for that particular process.

image
jadamones commented 2 months ago

@JbMachtMuschel Thanks, I will create some when I have a chance just to see if I get the loop.

JbMachtMuschel commented 2 months ago

@jadamones Thank you. The loop is one problem, but the memory consumption of WmiPrvSE.exe seems to be the causing problem. I checked https://learn.microsoft.com/en-us/troubleshoot/windows-server/system-management-components/scenario-guide-troubleshoot-wmiprvse-quota-exceeded-issues I configured this from the article: Client application performs abnormal, inefficient, or large queries But I do not know if this is happens: The WmiPrvse.exe process doesn't release resources as expected.

JbMachtMuschel commented 2 months ago

image

FabienTschanz commented 2 months ago

@jadamones Of course, silly me. Sometimes I'm like a blind chicken. Aaanyways, under the assumption that the WMI process has a memory limit of 4GB, we could try to increase the quota per process and overall like the following in an elevated Windows PowerShell session (according to https://techcommunity.microsoft.com/t5/ask-the-performance-team/wmi-high-memory-usage-by-wmi-service-or-wmiprvse-exe/ba-p/375491):

$quotaConfiguration = Get-WmiObject -Class __ProviderHostQuotaConfiguration -Namespace Root
$quotaConfiguration.MemoryAllHosts = 4 * 4GB
$quotaConfiguration.MemoryPerHost = 3 * 4GB
Set-WmiInstance -InputObject $quotaConfiguration

I don't know if a restart is required, but just to be sure, I would recommend one.

Another approach we can take is to bump up the ExchangeOnlineManagement version to 3.5.1. For this to work, you can update the file C:\Program Files\WindowsPowerShell\Modules\Microsoft365DSC\1.24.731.1\Dependencies\Manifest.psd1 so that the version for it is now 3.5.1 instead of 3.4.0. After that, run Update-M365DSCDependencies to install the updated module (or Install-Module ExchangeOnlineManagement -RequiredVersion 3.5.1 -Force for direct install).

JbMachtMuschel commented 2 months ago

@jadamones Thank you...the quotaConfiguration seems to be an approch, but I guess that this will reach the 16GB limit. I will inform you. Currently I am inporting the Rules and approx. 110 are already imported, but the memory consumption is very high: image

ricmestre commented 2 months ago

The EXO management module is a known memory hog ever since they moved to using REST API and it seems to only get worse and worse on each release, you can search in the interwebs that several people complain about this. Heck, there's even an official article [0] on how to reduce its memory usage which by the way doesn't solve the problem since it also leaks the huge amounts of memory it allocates and is never freed!!! Additionally to this the module is also prone to spitting random 0x800... errors so to workaround this problem I always loop the deployment until a max of 3 attempts if it fails.

I have this setup in DevOps pipelines which run on discardable containers so they are memory constrained and always fail if no changes are done, but with the code below, which is a variation of what @FabienTschanz posted before, I can get it going along with the rest of all the workloads I use. Please bear in mind that this allows to allocate all available memory of the machine to the WMI process (mem is not allocated all at once, only when required) so you may need to do some math to calculate better values for your requirements.

    #region Increase Memory Quota for WMI processes
    try {
        $ComputerSystem = Get-CimInstance -ClassName "Win32_ComputerSystem"
    }
    catch {
        throw $_
    }
    if (![String]::IsNullOrEmpty($ComputerSystem)) {
        $TotalPhysicalMemory = $ComputerSystem.TotalPhysicalMemory
        if (![String]::IsNullOrEmpty($TotalPhysicalMemory)) {
            try {
                $Quota = Get-CimInstance -Namespace "Root" `
                    -Class "__ProviderHostQuotaConfiguration"
            }
            catch {
                throw $_
            }

            if (![String]::IsNullOrEmpty($Quota)) {
                if ($Quota.MemoryAllHosts -ne $TotalPhysicalMemory) {
                    $Quota.MemoryAllHosts = $TotalPhysicalMemory
                    $Quota.MemoryPerHost = $TotalPhysicalMemory
                    $Quota.HandlesPerHost = 8192
                    $Quota.ThreadsPerHost = 512

                    Write-Output "Increasing WMI processes memory quota"
                    try {
                        Set-CimInstance -InputObject $Quota
                    }
                    catch {
                        throw $_
                    }

                    $WMIProcesses = Get-Process -Name "WMIPrvSE" `
                        -ErrorAction "SilentlyContinue"
                    if ($WMIProcesses.Count -ne 0) {
                        Write-Output "Restarting WMI processes"
                        foreach ($WMIProcess in $WMIProcesses) {
                            try {
                                Stop-Process -Id $WMIProcess.Id -Force
                            }
                            catch {
                                throw $_
                            }
                        }
                    }
                }
            }
        }
    }
    #endregion Increase Memory Quota for WMI processes

[0] https://techcommunity.microsoft.com/t5/exchange-team-blog/reducing-memory-consumption-of-the-exchange-online-powershell-v3/ba-p/3970086

JbMachtMuschel commented 2 months ago

@ricmestre Thank you for the information. Just now the process mentioned in the screenshot above, released memory as it reached 16GB and it is running and importing still: image

JbMachtMuschel commented 2 months ago

I was able to import the ActiveSyncDeviceAccessRules now :) I did a retry to observe how DSC is working on onready created rules and after a while the command throw this error: VERBOSE: [HAMS010288]: LCM: [ End Set ] The SendConfigurationApply function did not succeed.

VERBOSE: Operation 'Invoke CimMethod' complete. VERBOSE: Time taken for configuration job to complete is 4292.016 seconds

Interessting is, that the WMI process "never gives up", even after deleting all ActiveSyncDeviceAccessRules it starts again to create them - the powershell is not involved, I terminated the shell. So I killed the WMI process now.

jadamones commented 2 months ago

Thank you all for all the effort on this. Unfortunately, these haven't resolved my issue :( . I updated the host quota as suggested, and I attempted an upgrade of the Exchange Online module (maybe I'm missing something here). After upgrading the Exchange module to 3.5.1, I get this error MSFT_EXOOrganizationConfig failed to execute Test-TargetResource functionality with error message: IDX12729: Unable to decode the header '[PII of type 'System.String' is hidden. FWIW I'm running into this issue using the checkdsccompliancy.ps1 script in a DevOps pipeline.

Here are my WMI quotas (system has 32GB)

image
JbMachtMuschel commented 2 months ago

@jadamones I have had the same error with the 3.5.1 module and certificate based login. Just now I am using 3.4.0.

jadamones commented 2 months ago

Oh SMH... 🤦‍♂️ I realized that I didn't reboot the machine after updating the quotas. That seems to have done the trick! Interesting that the documentation just says to restart the WMI service, but definitely seems to be working after a reboot now. Thank you all for your input here. Much appreciated!

FabienTschanz commented 2 months ago

@ricmestre Do we want to document this as a workaround somewhere? Just for the sake that it is documented and that there is something we can actually do to circumvent the issue?

ricmestre commented 2 months ago

@FabienTschanz I wouldn't mind if you add something for example here [0] which looks scarcely empty and both you and I know that there are other issues out there that are not documented how to fix it or at least workaround them, but for this particular case I'd really like to have some kind of improvement in the module itself.

@NikCharlebois @ykuijs @andikrueger @desmay @malauter Hi, is this something that any of you can take to the EXO team in order to solve it or at least improve the experience? Running a cmdlet here and there is one thing, another one is trying to import the whole workload to other tenants using M365DSC, or even just trying to deploy a single resource but that contains hundreds or even thousands of children which will only exacerbate this known memory issue.

[0] https://microsoft365dsc.com/user-guide/get-started/troubleshooting/

FabienTschanz commented 2 months ago

@ricmestre I will add an entry in the troubleshooting section for this issue and raise a PR.

JbMachtMuschel commented 2 months ago

Shall I open another issue regarding? Interessting is, that the WMI process "never gives up", even after deleting all ActiveSyncDeviceAccessRules it starts again to create them - the powershell is not involved, I terminated the shell. So I killed the WMI process now. To clarify: The import of the more than 400 is done, the command powershell is reflecting this, but WmiPrvSE process dosen't care. It seems, it never stops without interfering.

ricmestre commented 2 months ago

That's how the LCM works, check https://learn.microsoft.com/en-us/powershell/dsc/managing-nodes/metaconfig?view=dsc-1.1 and settings ConfigurationMode and ConfigurationModeFrequencyMins

JbMachtMuschel commented 2 months ago

Hi again, I did several tests with different ExchangeOnlinemanagement module versions and only 3.4.0 is working - 3.5.0 and 3.5.1 is throwing "Unable to decode the header '[PII of type 'System.String' is hidden". Powershell 7 makes no difference.

Unfortunately the error is back. It seems that WMI process does NOT release memory like in the image below, it crashes again. Then I restarted the process and once it released memory, but a while afterwards, it crashed again.

image

@ricmestre Thanks for the info, I was not aware of this.

JbMachtMuschel commented 2 months ago

@FabienTschanz Hi Fabien, I noticed that Exchange retentionpolicies and retentiontags are not supported by DSC or I did not find them in my exports. Do you have an idea how to address this?

FabienTschanz commented 2 months ago

@JbMachtMuschel For resources not available in DSC, you can open a new issue (new resource proposal).

FabienTschanz commented 2 months ago

What's the next course of action here? Do we want to close this issue or try to raise attention by the Exchange team?

JbMachtMuschel commented 2 months ago

Of course it would be a good idea to raise attention by the Exchange team. We decided now not to use DSC, since it for our usecase to reliable enough. Nevertheless we will run exports - better to have them then to need them ;) We simply go on with Powershell exporting/importing. Thanks for you help.

FabienTschanz commented 2 months ago

@NikCharlebois Can you provide us with a contact to the Exchange Online team or raise an issue there? We can't seem to get in touch with them, and this issue is a huge pain point which should be addressed in the Exchange PowerShell module itself, not with some workaround from DSC. Thank you.

Borgquite commented 1 month ago

Completely agree that this issue needs solving at the ExchangeOnline module end. In the mean time another fix is to try to implement Connect-ExchangeOnline -CommandName option in this module and MSCloudLoginAssistant per #3956 - although as @nikcharlebois says, that's probably a chunk of work...

FabienTschanz commented 1 month ago

I just created the PR https://github.com/microsoft/MSCloudLoginAssistant/pull/181 that introduces the proposed changes with the smarter command import logic. I sure hope that doesn't break the stuff, although it didn't break it during my testing...

ykuijs commented 5 days ago

Guys, I am working with a customer that also has this issue. We are trying some of the suggestions that were listed above.

I am also trying to find a contact at the Exchange PG, so we can have this issue investigated and hopefully resolved. Would be great if you could provide me with some input:

ricmestre commented 5 days ago

@ykuijs The issue is more visible when the blueprints are much bigger and consume more memory, I've seen it fail running my integration pipeline when deploying multiple workloads for Create/Update/Remove, and on which I then call Test-DscConfiguration afterwards, for this problem I increased the used memory on the WMI processes and ensured that both the DSC core process and the WMI processes are restarted after each deployment and then again after each call to Test-DscConfiguration per each workload so basically involves a lot of restarts but I didn't see more memory issues after changing the code to proceed this way.

Unfortunately the EXO module is still very prone to random WMI failures, such as the one reported in the title of this issue, even with small blueprints so in my case I've added a retry system of 3 times until it succeeds otherwise it aborts the deployment, don't have any other better way to workaround this since there's no way to predict how and/or when it will fail.

The problem is definitely with the EXO module itself, these problems don't occur in DSC with any other workload.

Borgquite commented 5 days ago

@ykuijs It's already well described in the Microsoft365DSC troubleshooting guide, at least for me https://microsoft365dsc.com/user-guide/get-started/troubleshooting/#error-the-wmi-service-or-the-wmi-provider-returned-an-unknown-error-hresult-0x80041033-when-running-exchange-workload

ykuijs commented 5 days ago

@ricmestre, @Borgquite, thanks for this info!

For sure fixing the issue in the module is the best way to resolve these issues, that is why I am planning to contact the PG. In the meantime, I have found a contact, I will reach out later today. I just wanted to make sure I have a clear understanding of the issue and ideally, how to reproduce the issue quickly and easily (the bigger the chances are they will spend time on fixing it).

ykuijs commented 5 days ago

A strange thing is that my customer, is experiencing this issue when applying tenant settings to SPO:

[[SPOTenantSettings]TenantSettingsDefaults::[SharePoint]SharePoint_Configuration] Connected to the SharePoint Online 
Admin Center at 'https://tenantname-admin.sharepoint.com'/ to run this cmdlet
##[error]The WS-Management service cannot process the request. The WMI service or the WMI provider returned an unknown error: 
HRESULT 0x80041033 
    + CategoryInfo          : ResourceUnavailable: (root/Microsoft/...gurationManager:String) [], CimException
    + FullyQualifiedErrorId : HRESULT 0x80041033
    + PSComputerName        : localhost
VERBOSE: Operation 'Invoke CimMethod' complete.
FabienTschanz commented 5 days ago

One way to reduce memory consumption (for as long as the module is bad at handling the memory itself) would be to only load the required commands using the new functionality from https://github.com/microsoft/MSCloudLoginAssistant/pull/181. But that probably won‘t work if we‘re using 30 different EXO resources in the same deployment run because it‘ll load still load many commands. The functionality is here, but not yet used.

ricmestre commented 5 days ago

@ykuijs That's weird but it looks like someone else also reported something similar on #4183, although no solution came out of it.

Then there's also #4632 where it's about EXO again, where it was reported that going back on older versions of DSC solved the issue. Here I suspect that that solved the issue because the older DSC versions used old EXO module versions which didn't consume as much memory and didn't have as many problems as the newer ones.

Borgquite commented 5 days ago

A strange thing is that my customer, is experiencing this issue when applying tenant settings to SPO:

@ykuijs If the issue is that you're hitting the WMI memory limit, then it could in theory break in any module. If you are say, running many many EXO resources (pushing the memory usage higher and higher) - but the SPO resource happens to be the one that pushes you over 4GB, I'd expect that. Logging in to SPO may be the 'straw that breaks the camel's back' - but it's EXO that is normally responsible for the majority of the memory usage.

For any given configuration, the failure normally seems to take place at the same time - but the underlying issue is that the WMI host runs out of memory - and it's the EXO module that's normally chewed up the lion's share of it.

ykuijs commented 2 days ago

According to the memory optimization article, the issue is caused by the fact that the module doesn't handle Connecting and Disconnecting properly. If that is the case, I am wondering why are we constantly connecting and disconnecting to EXO? Can't we simply reuse the already created connection?

Borgquite commented 2 days ago

@ykuijs I agree - trying to reduce the need to connect and disconnect is good. However I've observed:

  1. Microsoft365DSC appears to disconnect/reconnect whenever it changes workloads (so for example, when running a Graph resource followed by an Exchange Online resource, it seems to trigger this) - I don't know exactly why, but I guess it could be because Microsoft's PowerShell modules originating from different teams within the organisation historically aren't very good at running alongside each other (e.g. assembly conflicts that constantly afflict the different modules?)
  2. It's possible this can be somewhat mitigated by carefully arranging configurations e.g. to run all Graph resources, followed by Exchange Online resources - when creating a custom IaC configurations it puts a lot of burden on the configuration author, but isn't impossible.
  3. However even then it's still necessary for some resources (like EXOManagementRoleAssignment) to switch between Graph and EXO during individual resource execution - in this case, because you need to pick up Administrative Unit data, and the Administrative Unit cmdlets in Exchange Online are less functional than Graph (see issue #3064) - and other modules may be similarly affected.

So it seems that there are two difficult problems here which only Microsoft can really resolve:

The situation with Microsoft PowerShell modules is frankly remininscent of the bad old Windows 3.1 days of 'DLL Hell' - lack of coordination between teams, or a refusal to work on Windows PowerShell 5.1 because 'we should all be using PowerShell Core' - with a lack of understanding of why some projects (including this one) still have a dependency on Windows PowerShell 5.1 instead of PowerShell Core. In my view, they got too excited about breaking changes, which resulted in too many removed features & too many breaking changes in PowerShell Core removing any incentive to move - failing to learn the lessons of e.g. Perl 6. It's an absolute mess in my opinion.

FabienTschanz commented 2 days ago

Are there any things we should be checking out and exploring inside of the community, e.g. trying a version with lesser context switches, maybe auto-arranging resources based on their type / workload, reduce dependencies to Windows PowerShell 5.1 etc? How about a combined issue with all the steps that are worth exploring and tracking them in dedicated issues?

I'd be glad to help with the implementation since I'm very familiar with the internas of Microsoft365DSC and the underlying technology. What do you think? @ykuijs @Borgquite @ricmestre @NikCharlebois