microsoftgraph / group-membership-management

Group Membership Management (GMM) is a service that dynamically manages the membership of AAD Groups. Groups managed by GMM can have their membership defined using existing AAD Groups and/or custom membership sources.
Other
49 stars 10 forks source link

Error after long time of sync #33

Closed dborchers-gc closed 1 year ago

dborchers-gc commented 1 year ago

After some weeks of no change from agroup, the Job seems to be switch to "Error"

2022-11-24 09_14_46-Window

The detailed Message is this: "Unexpected exception. System.Net.Http.HttpRequestException: Response status code does not indicate success: 500 (Internal Server Error). at Microsoft.Azure.WebJobs.Extensions.DurableTask.DurableOrchestrationContext.CallDurableTaskFunctionAsync[TResult](String functionName, FunctionType functionType, Boolean oneWay, String instanceId, String operation, RetryOptions retryOptions, Object input, Nullable`1 scheduledTimeUtc) in D:\a_work\1\s\src\WebJobs.Extensions.DurableTask\ContextImplementations\DurableOrchestrationContext.cs:line 742 at Microsoft.Azure.WebJobs.Extensions.DurableTask.DurableOrchestrationContext.ScheduleDurableHttpActivityAsync(DurableHttpRequest req) in D:\a_work\1\s\src\WebJobs.Extensions.DurableTask\ContextImplementations\DurableOrchestrationContext.cs:line 308 at Microsoft.Azure.WebJobs.Extensions.DurableTask.DurableOrchestrationContext.Microsoft.Azure.WebJobs.Extensions.DurableTask.IDurableOrchestrationContext.CallHttpAsync(DurableHttpRequest req) in D:\a_work\1\s\src\WebJobs.Extensions.DurableTask\ContextImplementations\DurableOrchestrationContext.cs:line 271 at Hosts.MembershipAggregator.OrchestratorFunction.RunOrchestratorAsync(IDurableOrchestrationContext context) in D:\a\1\2.0.2208.5\Service\GroupMembershipManagement\Hosts\MembershipAggregator\Function\Orchestrator\OrchestratorFunction.cs:line 101"

Do you have an Idea what this could be?

alrios-ms commented 1 year ago

@dborchers-gc Looks like the HTTP call from MembershipAggregator to GraphUpdater failed for some reason, unfortunately the exception message doesn't provide enough information, would you mind checking your Application Insights logs around the time this error occurred, hopefully there will be more information there.

dborchers-gc commented 1 year ago

Sure I can. Could you tell me what I have to look for? Or what’s the right filters?

And do you have an idea how I can set it back to work?

Gesendet von Outlook für iOShttps://aka.ms/o0ukef


Von: alrios-ms @.> Gesendet: Tuesday, November 29, 2022 6:46:59 PM An: microsoftgraph/group-membership-management @.> Cc: Daniel Remmers @.>; Mention @.> Betreff: Re: [microsoftgraph/group-membership-management] Error after long time of sync (Issue #33)

@dborchers-gchttps://github.com/dborchers-gc Looks like the HTTP call from MembershipAggregator to GraphUpdater failed for some reason, unfortunately the exception message doesn't provide enough information, would you mind checking your Application Insights logs around the time this error occurred, hopefully there will be more information there.

— Reply to this email directly, view it on GitHubhttps://github.com/microsoftgraph/group-membership-management/issues/33#issuecomment-1331045460, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APGYI5YKZXLTRRNETGSRMXLWKY6RHANCNFSM6AAAAAASJ67QVU. You are receiving this because you were mentioned.Message ID: @.***>

alrios-ms commented 1 year ago

You could try this query:

exceptions | where timestamp >= todatetime('2022-11-29T17:22:21.1516534Z') and timestamp <= todatetime('2022-11-29T17:22:21.1516534Z') | where problemId contains "exception" | order by timestamp

Update both dates to a time range when the exception occurred

To update the job so it gets processed again:

  1. In the Azure Portal locate and open your jobs storage account, naming convention jobs<environment><random-string>
  2. Click on Tables image
  3. Click on the syncJobs table
  4. Locate the record you want to edit
  5. Right click -> Edit -> Set the Status to Idle - case sensitive - image
  6. Click on Update button

Let me know if you have further questions.

dborchers-gc commented 1 year ago

Maybe this helps? Its from the Exceptions Logs

image

image

These are the main entries in that time range

alrios-ms commented 1 year ago

Would you mind running this query, this should tell us if there was an issue with the number of connections used by the functions.

exceptions | where outerMessage contains "Host thresholds exceeded: Connections" | order by timestamp desc

Roughly how many jobs are currently set up to run in your GMM instance?

Since these latest logs are from SecurityGroup and the previous logs are from MembershipAggregator and given that they each have their own Function App looks like this might have been a transient network issue affecting multiple Function Apps. They failed to communicate with each other and were also unable to reach the Graph API.