splunk / splunk-add-on-microsoft-azure

Splunk Add-on for Microsoft Azure
Apache License 2.0
11 stars 9 forks source link

HTTPError: 400 Client Error for signIns #16

Open ghost opened 2 years ago

ghost commented 2 years ago

I have recently starting receiving a 400 error on my signIns input for https://graph.microsoft.us/v1.0/auditLogs/signIns (Microsoft Graph for US Government L4). Are there any new known issues with reaching the signIns endpoint for graph.microsoft.us?

There's no issues with pulling the directoryAudit input for https://graph.microsoft.us/v1.0/auditLogs/directoryAudits. All basic troubleshooting steps have been taken and we have completely removed the input and re-added it but no luck.

JasonConger commented 2 years ago

What version of Azure AD do you have? The sign-in input requires at least an Azure AD P1 license.

https://github.com/splunk/splunk-add-on-microsoft-azure/wiki/Install-the-Splunk-Add-on-for-Microsoft-Azure#prerequisites

SgtMoose commented 1 year ago

@adehn-gh We have the same issue when trying to query for SignIns using the v1.0 endpoint. We are also in a GCCHigh tenant with G5 licensing. We have an ongoing support ticket open with MS regarding problems querying for SignIn data. They told us that we needed to switch over to the Beta endpoint as the v1.0 endpoint was not current. We did that with success for a short time and now the Beta endpoint is failing with constant read timeouts and max retries exceeded errors. Hopefully you have better luck trying the Beta endpoint.

Let me know what you end up figuring out because working with MS Support is painful and they keep blaming wither us or Splunk making us prove it's not us or Splunk every time.

JasonConger commented 1 year ago

Are you seeing 429 errors? If so, this indicates a throttling issue with the Microsoft REST API. As of the date of this comment, the Microsoft REST API limits you to 5 requests per 10 seconds (reference). If you have a high-volume environment, you will hit that 429 throttle pretty easily because the REST API will page the results in batches of 1,000 records. Each requested page counts toward your throttling limit.

Another, more scalable, way to get this data is to send the sign-ins to an event hub. Then, use the Splunk Add-on for Microsoft Cloud Services to retrieve the data.

SgtMoose commented 1 year ago

I wish we were seeing the 429 errors but we are not. When we try to use the v1.0 endpoint we are seeing is HTTP Error 400 and when we switch to use the Beta endpoint we get the MaxRetries Timeout errors. Copies of the messages are below. MS Support advised us no use the Beta endpoint as all new work is being done there.

v1.0 API Endpoint - requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://graph.microsoft.us/v1.0/auditLogs/signIns?$orderby=createdDateTime&$filter=createdDateTime+ge+2022-08-29T12:12:43Z+and+createdDateTime+le+2022-08-30T00:12:43.000000Z

Beta API Endpoint - 2022-09-15 21:04:20,855 ERROR pid=34302 tid=MainThread file=base_modinput.py:log_error:316 | Get error when collecting events. socket.timeout: The read operation timed out urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='graph.microsoft.us', port=443): Read timed out. (read timeout=5) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='graph.microsoft.us', port=443): Max retries exceeded with url: /beta/auditLogs/signIns?$orderby=createdDateTime&$filter=createdDateTime+ge+2022-08-29T12:12:43Z+and+createdDateTime+le+2022-08-30T00:12:43.000000Z (Caused by ReadTimeoutError("HTTPSConnectionPool(host='graph.microsoft.us', port=443): Read timed out. (read timeout=5)")) requests.exceptions.ConnectionError: HTTPSConnectionPool(host='graph.microsoft.us', port=443): Max retries exceeded with url: /beta/auditLogs/signIns?$orderby=createdDateTime&$filter=createdDateTime+ge+2022-08-29T12:12:43Z+and+createdDateTime+le+2022-08-30T00:12:43.000000Z (Caused by ReadTimeoutError("HTTPSConnectionPool(host='graph.microsoft.us', port=443): Read timed out. (read timeout=5)"))

The above are snippets from Splunk. I can provide the full entries if needed.

JasonConger commented 1 year ago

The timeout issue will be addressed in the next release, but a workaround was posted here => https://github.com/splunk/splunk-add-on-microsoft-azure/issues/9#issuecomment-1198444040

For the 400 error, it may be the date range requested. Try pasting the bad request URL in the Microsoft Graph Explorer here => https://developer.microsoft.com/graph/graph-explorer

SgtMoose commented 1 year ago

@JasonConger I spoke with our IT guys and unfortunately since we are a MS GCCH cloud customer we can't use the Graph Explorer since it is not supported with the gov cloud.

That being I said I keyed off you comment about looking at the date range requested to investigate that a little further and am finding some anomalies that I can't explain. I tried to limit the queries to go no further back than roughly 12 hours by setting the Start Date to query back to at 2022-09-22 last week when I was testing. However when I was looking at the logs from our on premise Heavy Forwarder and from Splunk Cloud, the date range in the queries looked to be trying to go back further than configured. The URLs from the logs:

Adhoc SH - https://graph.microsoft.us/beta/auditLogs/signIns?$orderby=createdDateTime&$filter=createdDateTime+ge+2022-08-29T12:12:43Z+and+createdDateTime+le+2022-08-30T00:12:43.000000Z

FRD HF - https://graph.microsoft.us/beta/auditLogs/signIns?$orderby=createdDateTime&$filter=createdDateTime+ge+2022-08-16T02:45:29Z+and+createdDateTime+le+2022-08-16T14:45:29.000000Z

I'm guessing this could also be causing the timeout issues but can't explain why the set query date wasn't being honored.

SgtMoose commented 1 year ago

Providing some additional information from the testing we have performed. It would seem that there is an issue with the "Start Date" field specifically in the "Sign-Ins" input. If you try to use the "Start Date" field to constrain how far back to go, you get the following error: b'{"messages":[{"type":"ERROR","text":"can\'t compare offset-naive and offset-aware datetimes"}]}' (also in screenshot)

You can input just the date, but when the queries run, the queries do not honor the "Start Date" and attempt to pull much more information than possible and will time out. The example queries are in my prior post from a Cloud SH and HF. Additionally, if you do not specify a value for the "Start Date" it says that by default it will only query back to 24 hours but we got the same results where it would try to pull data for a month or more.

I hope all of this info helps!

Inputs - Google Chrome 2022-09-28 17 18 37

JasonConger commented 1 year ago

Thanks for the additional detail. The "...offset aware..." thing is definitely a bug. The start date piece may or may not be a bug (investigating).

Here's why:

Note: the checkpoint key (the piece used to look up the checkpoint timestamp value) is based on the input name. So, if you delete/recreate an input with the same name, the old checkpoint value will continue to be used.

SgtMoose commented 1 year ago

Thank you for the additional information @JasonConger. We will create a new checkpoint to see what happens regarding the start date finding we had.

austinwelborn123 commented 1 year ago

Good Afternoon, I'm having the same issue as @SgtMoose. Seems like the same issue all around, Azure gov cloud, tried beta and v1, created a new input so that should count out the checkpoint issue, and I even played around with changing the filter string.

Any help you could provide would be greatly appreciated.

austinwelborn123 commented 1 year ago

Good Afternoon, I'm having the same issue as @SgtMoose. Seems like the same issue all around, Azure gov cloud, tried beta and v1, created a new input so that should count out the checkpoint issue, and I even played around with changing the filter string.

Any help you could provide would be greatly appreciated.

I found the issue!

In the MS_AAD_signins.py script on line 174, the gt should be ge. IDK if matters but also using the beta version.

url = graph_base_url + "/%s/auditLogs/signIns?$orderby=createdDateTime&$filter=createdDateTime+gt+%s+and+createdDateTime+le+%s" % (endpoint, query_date, time_throttle)