lithnet / googleapps-managementagent

Google Workspace Management Agent for MIM 2016
MIT License
12 stars 4 forks source link

no-start-ma failures on most delta imports #76

Closed IAmStevenJohnson closed 2 years ago

IAmStevenJohnson commented 2 years ago

We get a no-start-ma failure on most (maybe about 2/3 of them) delta imports. This also clears out the delta import file so the only way to get those confirming imports is to do a full import. I've tried making a backup of the file before doing the delta import in order to retry it. I retry the delta import repeatedly with several different failed files and it seems to fail every time on them. So I wonder if this has to do with the data in the file. I haven't been able to determine what the difference is between when a delta import succeeds and when it fails. The first event log entry below shows "The value-change value was blank" as an exception. So I thought to look for attributes that are getting deleted. I do find them in the xml file. Is there anything specific I could look for in the delta file that might cause this? Or any other thoughts on why this happens?

Here's 3 entries from the event viewer:

First Event 6401 `The extensible extension returned an unsupported error. The stack trace is:

"System.ArgumentException: The value-change value was blank at Lithnet.MetadirectoryServices.CSEntryChangeDeserializer.XmlReadValueChangeNode(XElement element, AttributeType attributeType) at Lithnet.MetadirectoryServices.CSEntryChangeDeserializer.XmlReadValueChangesNode(XElement element, AttributeType attributeType) at Lithnet.MetadirectoryServices.CSEntryChangeDeserializer.XmlReadAttributeChangeNode(XElement element, CSEntryChange csentry) at Lithnet.MetadirectoryServices.CSEntryChangeDeserializer.XmlReadAttributeChangesNode(XElement element, CSEntryChange csentry) at Lithnet.MetadirectoryServices.CSEntryChangeDeserializer.Deserialize(XElement element, CSEntryChange csentry) at Lithnet.MetadirectoryServices.CSEntryChangeDeserializer.Deserialize(XElement element) at Lithnet.MetadirectoryServices.CSEntryChangeDeserializer.Deserialize(String file) at Lithnet.MetadirectoryServices.CSEntryChangeQueue.LoadQueue(String filename) at Lithnet.GoogleApps.MA.ManagementAgent.OpenImportConnection(KeyedCollection2 configParameters, Schema types, OpenImportConnectionRunStep importRunStep) in D:\dev\git\lithnet\googleapps-managementagent\src\Lithnet.GoogleApps.MA\ManagementAgent.cs:line 283 Forefront Identity Manager 4.6.607.0"

Second Event 6401 'The management agent controller encountered an unexpected error.

"ERR_: MMS(14932): ..\libutils.cpp(10210): Failed to start run because of undiagnosed MA error Forefront Identity Manager 4.6.607.0"'

Third Event 6005 'The management agent "GSuite" failed on run profile "GSuite Delta Import" because of an unspecified management agent error. Additional Information

%3'

ryannewington commented 2 years ago

This sounds like a problem with the export XML file. The MA shouldn't be writing blank value-change elements. Can you send a copy of a bad XML file to support@lithnet.io so we can see what is going on please

IAmStevenJohnson commented 2 years ago

@ryannewington thanks for offering to look at this and sorry for the delay in getting it to you. I just emailed it to you.

Some more background. This started happening when we migrated from FIM to MIM one month ago. We only have this problem in our production environment--it doesn't happen in our non-production environment.

More background in case it's relevant. I don't know if this is the same issue, but we also have intermittent issues deleting values in Google since the migration. Mostly with attributes we've added to the google schema. I've always just used the delete method in code to delete attributes and never had a problem until we migrated to MIM. But now it just keeps failing and retrying (after a full import sees it's still there). I've tried setting the value to string.empty or even a space. They fail too. At least intermittently. I haven't narrowed down exactly when/why. But this issue seems worse (or at least more consistent) in our non-production environment. I currently have the same 900 accounts failing over and over each run in non-prod. Like I said, just some more background. I can post a separate issue on that.

Thanks for any thoughts you have on this issue.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs.