microsoft / PowerShell-DSC-for-Linux

PowerShell Desired State Configuration - for Linux
Other
339 stars 132 forks source link

dsc v1.2.4 does not run to completion when a SetScript (from nxScript) returns non-zero exit code #885

Open eehret opened 2 years ago

eehret commented 2 years ago

In dsc versions prior to 1.2.4 we were making use of non-zero exit codes in nxScript to signal situations where : a) we try to apply a desired configuration and something goes wrong, or, b) it isn't feasible to implement an automatic remediation for some issue and are only using TestScript to detect resources that are non compliant while in ApplyAndAutoCorrect mode. In this case an 'exit 1' would signal that we're not actually applying remediation and just want to continue reporting the resource as non-compliant.

This worked fine up to v1.2.3 , after that something seems to have changed and this strategy no longer works.

What follows is the DSC detailed log output showing how two non-compliant nxScript resources get treated by each of v1.2.3 and v.1.2.4. I am only including the portions of the log pertaining to the nxScript resources and the resulting overall status at the end. I start with v1.2.3, run the configuration, capture the output, then uninstall v.1.2.3 and install v1.2.4 and repeat the same test and capture the same output.

==== Functionality in v1.2.3 (what I consider "desired behaviour"):

~$ sudo /opt/microsoft/dsc/Scripts/python3/PerformRequiredConfigurationChecks.py instance of OMI_Error { OwningEntity=OMI:CIMOM MessageID=OMI:MI_Result:1 Message=Failed to apply the configuration. These resources produced errors: [nxScript]check_unowned_files, [nxScript]check_ungrouped_files MessageArguments={} PerceivedSeverity=7 ProbableCause=0 ProbableCauseDescription=Unknown CIMStatusCode=1 OMI_Code=1 OMI_Category=0 OMI_Type=MI OMI_ErrorMessage=A general error occurred, not covered by a more specific error code. }

Log output from /var/opt/omi/log/dscdetailed.log :

2022/09/29 01:03:05: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Displaying messages from the engine: WMI channel 1 ResourceID: [nxScript]check_unowned_files Message : [Controller]: LCM: [ Start Resource ] [[nxScript]check_unowned_files] 2022/09/29 01:03:05: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Getting a registration instance for MSFT_nxScriptResource 2022/09/29 01:03:05: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Moving the resource [nxScript]check_unowned_files of class MSFT_nxScriptResource to desired state. 2022/09/29 01:03:05: INFO: CAEngine.c(1297): EventId=3 Priority=INFO Executing OMI Provider. 2022/09/29 01:03:05: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Method Exec_WMIv2Provider started with parameters Class name: MSFT_nxScriptResource Resource ID: [nxScript]check_unowned_files Flags: 134217728 Execution Mode: 192 DSC resource Namespace: null 2022/09/29 01:03:05: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Executing operations for WMIv2 DSC resource MSFT_nxScriptResource with resource name [nxScript]check_unowned_files 2022/09/29 01:03:05: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Invoking session and getting result for namespace root/Microsoft/DesiredStateConfiguration, classname MSFT_nxScriptResource for method TestTargetResource 2022/09/29 01:03:05: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Displaying messages from the engine: WMI channel 1 ResourceID: [nxScript]check_unowned_files Message : [Controller]: LCM: [ Start Test ] [[nxScript]check_unowned_files] 2022/09/29 01:03:18: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Displaying messages from the engine: WMI channel 1 ResourceID: [nxScript]check_unowned_files Message : [Controller]: LCM: [ End Test ] [[nxScript]check_unowned_files] in 12.7134 seconds. 2022/09/29 01:03:18: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Displaying messages from the engine: WMI channel 1 ResourceID: [nxScript]check_unowned_files Message : [Controller]: LCM: [ Start Set ] [[nxScript]check_unowned_files] 2022/09/29 01:03:18: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Invoking session and getting result for namespace root/Microsoft/DesiredStateConfiguration, classname MSFT_nxScriptResource for method SetTargetResource 2022/09/29 01:03:18: INFO: CAEngine.c(763): EventId=3 Priority=INFO MoveToDesiredState = 1, *resultStatus = 0 2022/09/29 01:03:18: INFO: CAEngine.c(769): EventId=3 Priority=INFO SetResourcesInOrder failed in MoveToDesiredState 2022/09/29 01:03:18: ERROR: null(0): EventId=1 Priority=ERROR Job 4930B63A-B717-4645-84B3-F55A317666AF : This event indicates that failure happens when LCM is processing the configuration. ErrorId is 1. ErrorDetail is The SendConfigurationApply function did not succeed.. ResourceId is [nxScript]check_unowned_files and SourceInfo is ::4689::9::nxScript. ErrorMessage is A general error occurred, not covered by a more specific error code.. The related ResourceID is [nxScript]check_unowned_files.. 2022/09/29 01:03:18: INFO: CAEngine.c(812): EventId=3 Priority=INFO SetResourcesInOrder finalr = 1 2022/09/29 01:03:18: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Displaying messages from the engine: WMI channel 1 ResourceID: [nxScript]check_ungrouped_files Message : [Controller]: LCM: [ Start Resource ] [[nxScript]check_ungrouped_files] 2022/09/29 01:03:18: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Getting a registration instance for MSFT_nxScriptResource 2022/09/29 01:03:18: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Moving the resource [nxScript]check_ungrouped_files of class MSFT_nxScriptResource to desired state. 2022/09/29 01:03:18: INFO: CAEngine.c(1297): EventId=3 Priority=INFO Executing OMI Provider. 2022/09/29 01:03:18: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Method Exec_WMIv2Provider started with parameters Class name: MSFT_nxScriptResource Resource ID: [nxScript]check_ungrouped_files Flags: 134217728 Execution Mode: 192 DSC resource Namespace: null 2022/09/29 01:03:18: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Executing operations for WMIv2 DSC resource MSFT_nxScriptResource with resource name [nxScript]check_ungrouped_files 2022/09/29 01:03:18: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Invoking session and getting result for namespace root/Microsoft/DesiredStateConfiguration, classname MSFT_nxScriptResource for method TestTargetResource 2022/09/29 01:03:18: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Displaying messages from the engine: WMI channel 1 ResourceID: [nxScript]check_ungrouped_files Message : [Controller]: LCM: [ Start Test ] [[nxScript]check_ungrouped_files] 2022/09/29 01:03:28: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Displaying messages from the engine: WMI channel 1 ResourceID: [nxScript]check_ungrouped_files Message : [Controller]: LCM: [ End Test ] [[nxScript]check_ungrouped_files] in 10.3786 seconds. 2022/09/29 01:03:28: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Displaying messages from the engine: WMI channel 1 ResourceID: [nxScript]check_ungrouped_files Message : [Controller]: LCM: [ Start Set ] [[nxScript]check_ungrouped_files] 2022/09/29 01:03:28: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Invoking session and getting result for namespace root/Microsoft/DesiredStateConfiguration, classname MSFT_nxScriptResource for method SetTargetResource 2022/09/29 01:03:28: INFO: CAEngine.c(763): EventId=3 Priority=INFO MoveToDesiredState = 1, *resultStatus = 0 2022/09/29 01:03:28: INFO: CAEngine.c(769): EventId=3 Priority=INFO SetResourcesInOrder failed in MoveToDesiredState 2022/09/29 01:03:28: ERROR: null(0): EventId=1 Priority=ERROR Job 4930B63A-B717-4645-84B3-F55A317666AF : This event indicates that failure happens when LCM is processing the configuration. ErrorId is 1. ErrorDetail is The SendConfigurationApply function did not succeed.. ResourceId is [nxScript]check_ungrouped_files and SourceInfo is ::4724::9::nxScript. ErrorMessage is A general error occurred, not covered by a more specific error code.. The related ResourceID is [nxScript]check_ungrouped_files.. 2022/09/29 01:03:28: INFO: CAEngine.c(812): EventId=3 Priority=INFO SetResourcesInOrder finalr = 1 2022/09/29 01:03:29: ERROR: null(0): EventId=1 Priority=ERROR Job 4930B63A-B717-4645-84B3-F55A317666AF : DSC Engine Error : Error Message Failed to apply the configuration. These resources produced errors: [nxScript]check_unowned_files, [nxScript]check_ungrouped_files Error Code : 1 2022/09/29 01:03:29: WARNING: null(0): EventId=2 Priority=WARNING Job 4930B63A-B717-4645-84B3-F55A317666AF : Displaying messages from built-in DSC resources: WMI channel 1 ResourceID: Message : [Controller]: [] Consistency check completed. 2022/09/29 01:03:29: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Method CallConsistencyEngine ended successfully 2022/09/29 01:03:29: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Setting Metaconfiguration instance. 2022/09/29 01:03:29: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Copying configuration file from MetaConfig.mof to MetaConfig.backup.mof. 2022/09/29 01:03:30: INFO: null(0): EventId=3 Priority=INFO Job 4930B63A-B717-4645-84B3-F55A317666AF : Method Invoke_PerformRequiredConfigurationChecks_Internal ended successfully 2022/09/29 01:03:30: WARNING: null(0): EventId=2 Priority=WARNING Job 4930B63A-B717-4645-84B3-F55A317666AF : PerformRequiredConfigurationChecks DSC operation completed in 81.5427 seconds. ================== Now for the functionality in v1.2.4 (what I consider "broken behaviour"): ~$ sudo /opt/microsoft/dsc/Scripts/python3/PerformRequiredConfigurationChecks.py instance of OMI_Error { OwningEntity=OMI:CIMOM MessageID=OMI:MI_Result:1 Message=A general error occurred, not covered by a more specific error code. MessageArguments={} PerceivedSeverity=7 ProbableCause=0 ProbableCauseDescription=Unknown CIMStatusCode=1 OMI_Code=1 OMI_Category=0 OMI_Type=MI OMI_ErrorMessage=A general error occurred, not covered by a more specific error code. } Output from /var/opt/omi/log/dscdetailed.log : 2022/09/29 01:24:34: INFO: null(0): EventId=3 Priority=INFO Job C407488E-C219-40FD-962E-16C67316156C : Getting a registration instance for MSFT_nxScriptResource 2022/09/29 01:24:34: INFO: null(0): EventId=3 Priority=INFO Job C407488E-C219-40FD-962E-16C67316156C : Moving the resource [nxScript]check_unowned_files of class MSFT_nxScriptResource to desired state. 2022/09/29 01:24:34: INFO: CAEngine.c(1325): EventId=3 Priority=INFO Executing OMI Provider. 2022/09/29 01:24:34: INFO: null(0): EventId=3 Priority=INFO Job C407488E-C219-40FD-962E-16C67316156C : Method Exec_WMIv2Provider started with parameters Class name: MSFT_nxScriptResource Resource ID: [nxScript]check_unowned_files Flags: 134217728 Execution Mode: 192 DSC resource Namespace: null 2022/09/29 01:24:34: INFO: null(0): EventId=3 Priority=INFO Job C407488E-C219-40FD-962E-16C67316156C : Executing operations for WMIv2 DSC resource MSFT_nxScriptResource with resource name [nxScript]check_unowned_files 2022/09/29 01:24:34: INFO: null(0): EventId=3 Priority=INFO Job C407488E-C219-40FD-962E-16C67316156C : Invoking session and getting result for namespace root/Microsoft/DesiredStateConfiguration, classname MSFT_nxScriptResource for method TestTargetResource 2022/09/29 01:24:47: INFO: null(0): EventId=3 Priority=INFO Job C407488E-C219-40FD-962E-16C67316156C : Invoking session and getting result for namespace root/Microsoft/DesiredStateConfiguration, classname MSFT_nxScriptResource for method SetTargetResource .... the process just stalls there and never completes. It proceeds to cause our cron process to accumulate a list of zombie processes for which this happens , each time DSC tries to run and fails to complete. ================== In the case of the desired behaviour, at the end of the configuration run that may have many such TestScript and SetScripts , if any number of the SetScript return non-zero exit code then all of those resources get reported at the end after the whole configuration runs and we can see the full list of anything that's non-compliant. Perfect! In the case of the broken behaviour, not only do we not get the full list but we don't even get any report back about the single resource that failed. In the consuming client (in this case the Azure DSC functionality of an Azure Automation Account, the computer in question simply gets stuck with a permanent status of "In progress" for the DSC run and it never reports its results. Also notable: We are only using the non-zero exit code of SetScript in order to work around an apparent flaw because when DSC is set to ApplyAndAutoCorrect , it will falsely report compliance after Test + Set complete even if Set did not successfully correct the non-compliance (this was an issue up to 1.2.3 but have not yet verified it still does that in 1.2.4). By explicitly returning a non-zero exit code in this case we were able to get the results we wanted and avoid the false compliance. When dsc is set to ApplyAndMonitor this doesn't appear to be an issue; if resources drift out of compliance or were never compliant in the first place they correctly show non-compliance in the dsc output. I'll try to confirm the above for v1.2.4 to see if we still have to return non-zero exit code from SetScript when mode is ApplyAndAutoCorrect. UPDATE: I just confirmed that with v1.2.4 , when mode is set to ApplyAndAutoCorrect , DSC still erroneously shows compliant state even when an nxScript resource is non-compliant. What I would like to see is one of the following. Either: 1) Revert functionality to what it was with dsc 1.2.3, in that if I return a non-zero exit code from SetScript, the remainder of the configuration still runs and the errors get properly aggregated and reported back up, or, 2) Add an extra Test pass after SetScript runs, so the sequence would become : TestScript, SetScript, TestScript . Because currently when mode is ApplyAndAutoCorrect, dsc seems to assume that all will be well after SetScript is called without really verifying that afterward.