Closed pemontto closed 3 years ago
OK just as I was posting this I discovered that this PR introduced non-breaking whitespace. A very non-obvious problem, especially from the API response.
Probably not something AzSentinel should have to cater for. Have opened a PR at the source https://github.com/Azure/Azure-Sentinel/pull/1448. Ideally the API should give a more verbose error.
Environment
Steps to reproduce
I recently updated some time series anomaly analytics based on this recent PR - https://github.com/Azure/Azure-Sentinel/pull/1418.
After this the rules fail to import with a vague 500 error.
```powershell PS /Users/test/git/test-Sentinel> $result = Import-AzSentinelAlertRule -SubscriptionId $SubscriptionId -WorkspaceName $Workspace -SettingsFile "./Customers/$CustomerName/azsentinel-rules.json" -Verbose -Debug VERBOSE: Found 1 rules VERBOSE: Getting Worspace from Subscription XXXX VERBOSE: GET https://management.azure.com/subscriptions/XXXX/providers/Microsoft.OperationalInsights/workspaces?api-version=2015-11-01-preview with 0-byte payload VERBOSE: received 2873-byte response of content type application/json VERBOSE: Workspace is: /subscriptions/XXXX/resourcegroups/sentinel/providers/microsoft.operationalinsights/workspaces/XXXX VERBOSE: properties : @{source=Azure; customerId=1a41bda2-f2d5-4915-a8c9-45eb186b9911; provisioningState=Succeeded; sku=; retentionInDays=30; features=; workspaceCapping=; publicNetworkAccessForIngestion=Enabled; publicNetworkAccessForQuery=Enabled; createdDate=Wed, 19 Aug 2020 04:21:36 GMT; modifiedDate=Mon, 19 Oct 2020 07:01:28 GMT} id : /subscriptions/XXXX/resourcegroups/sentinel/providers/microsoft.operationalinsights/workspaces/XXXX name : XXXX type : Microsoft.OperationalInsights/workspaces location : australiaeast tags : VERBOSE: Found Workspace XXXX in XXXX VERBOSE: Using URI: https://management.azure.com/subscriptions/XXXX/resourcegroups/sentinel/providers/microsoft.operationalinsights/workspaces/XXXX/providers/Microsoft.SecurityInsights/alertRules?api-version=2020-01-01 VERBOSE: GET https://management.azure.com/subscriptions/XXXX/resourcegroups/sentinel/providers/microsoft.operationalinsights/workspaces/XXXX/providers/Microsoft.SecurityInsights/alertRules?api-version=2020-01-01 with 0-byte payload VERBOSE: received 204543-byte response of content type application/json VERBOSE: Content encoding: utf-8 VERBOSE: Found 60 Alert rules VERBOSE: GET https://management.azure.com/subscriptions/XXXX/resourcegroups/sentinel/providers/microsoft.operationalinsights/workspaces/XXXX/providers/Microsoft.SecurityInsights/alertRules/XXXX/actions?api-version=2019-01-01-preview with 0-byte payload VERBOSE: received 12-byte response of content type application/json VERBOSE: Content encoding: utf-8 VERBOSE: Started with rule: Time series anomaly for data size transferred to public internet VERBOSE: Get rule 'Identifies anomalous data transfer to public networks. The query leverages built-in KQL anomaly detection algorithms that detects large deviations from a baseline pattern. A sudden increase in data transferred to unknown public networks is an indication of data exfiltration attempts and should be investigated. The higher the score, the further it is from the baseline value. The output is aggregated to provide summary view of unique source IP to destination IP address and port bytes sent traffic observed in the flagged anomaly hour. The source IP addresses which were sending less than bytessentperhourthreshold have been exluded whose value can be adjusted as needed . You may have to run queries for individual source IP addresses from SourceIPlist to determine if anything looks suspicious' VERBOSE: Rule Time series anomaly for data size transferred to public internet exists in Azure Sentinel DEBUG: "Sort-Object" - "kind" cannot be found in "InputObject". VERBOSE: PUT https://management.azure.com/subscriptions/XXXX/resourcegroups/sentinel/providers/microsoft.operationalinsights/workspaces/XXXX/providers/Microsoft.SecurityInsights/alertRules/XXXX?api-version=2019-01-01-preview with 8566-byte payload VERBOSE: received 74-byte response of content type application/json VERBOSE: {"error":{"code":"InternalServerError","message":"Internal server error"}} Import-AzSentinelAlertRule: Unable to invoke webrequest for rule Time series anomaly for data size transferred to public internet with error message: Response status code does not indicate success: 500 (Internal Server Error). ```This is the minimal test case
```json { "Scheduled": [ { "displayName": "Time series anomaly for data size transferred to public internet", "description": "'Identifies anomalous data transfer to public networks. The query leverages built-in KQL anomaly detection algorithms that detects large deviations from a baseline pattern.\nA sudden increase in data transferred to unknown public networks is an indication of data exfiltration attempts and should be investigated.\nThe higher the score, the further it is from the baseline value.\nThe output is aggregated to provide summary view of unique source IP to destination IP address and port bytes sent traffic observed in the flagged anomaly hour.\nThe source IP addresses which were sending less than bytessentperhourthreshold have been exluded whose value can be adjusted as needed .\nYou may have to run queries for individual source IP addresses from SourceIPlist to determine if anything looks suspicious'\n", "severity": "Low", "queryFrequency": "P1D", "queryPeriod": "P14D", "triggerOperator": "gt", "triggerThreshold": 0, "query": "let starttime = 14d;\nlet endtime = 1d;\nlet timeframe = 1h;\nlet scorethreshold = 5;\nlet bytessentperhourthreshold = 10;\nlet PrivateIPregex = @'^127\\.|^10\\.|^172\\.1[6-9]\\.|^172\\.2[0-9]\\.|^172\\.3[0-1]\\.|^192\\.168\\.';\nlet TimeSeriesData = (union isfuzzy=true\n(\nVMConnection\n| where TimeGenerated between (startofday(ago(starttime))..startofday(ago(endtime)))\n| where isnotempty(DestinationIP) and isnotempty(SourceIP)\n| extend DestinationIpType = iff(DestinationIp matches regex PrivateIPregex,\"private\" ,\"public\" )\n| where DestinationIpType == \"public\" | extend DeviceVendor = \"VMConnection\"\n| project TimeGenerated, BytesSent, DeviceVendor\n| make-series TotalBytesSent=sum(BytesSent) on TimeGenerated from startofday(ago(starttime)) to startofday(ago(endtime)) step timeframe by DeviceVendor\n),\n(\nCommonSecurityLog\n| where TimeGenerated between (startofday(ago(starttime))..startofday(ago(endtime)))\n| where isnotempty(DestinationIP) and isnotempty(SourceIP)\n| extend DestinationIpType = iff(DestinationIP matches regex PrivateIPregex,\"private\" ,\"public\" )\n| where DestinationIpType == \"public\"\n| project TimeGenerated, SentBytes, DeviceVendor\n| make-series TotalBytesSent=sum(SentBytes) on TimeGenerated from startofday(ago(starttime)) to startofday(ago(endtime)) step timeframe by DeviceVendor\n)\n);\n//Filter anomolies against TimeSeriesData\nlet TimeSeriesAlerts = TimeSeriesData\n| extend (anomalies, score, baseline) = series_decompose_anomalies(TotalBytesSent, scorethreshold, -1, 'linefit')\n| mv-expand TotalBytesSent to typeof(double), TimeGenerated to typeof(datetime), anomalies to typeof(double),score to typeof(double), baseline to typeof(long)\n| where anomalies > 0 | extend AnomalyHour = TimeGenerated\n| extend TotalBytesSentinMBperHour = round(((TotalBytesSent / 1024)/1024),2), baselinebytessentperHour = round(((baseline / 1024)/1024),2), score = round(score,2)\n| project DeviceVendor, AnomalyHour, TimeGenerated, TotalBytesSentinMBperHour, baselinebytessentperHour, anomalies, score;\nlet AnomalyHours = TimeSeriesAlerts | where TimeGenerated > ago(2d) | project TimeGenerated;\n//Union of all BaseLogs aggregated per hour\nlet BaseLogs = (union isfuzzy=true\n(\nCommonSecurityLog\n| where isnotempty(DestinationIP) and isnotempty(SourceIP)\n| where TimeGenerated between (startofday(ago(starttime))..startofday(ago(endtime)))\n| extend DateHour = bin(TimeGenerated, 1h) // create a new column and round to hour\n| where DateHour in ((AnomalyHours)) //filter the dataset to only selected anomaly hours\n| extend DestinationIpType = iff(DestinationIP matches regex PrivateIPregex,\"private\" ,\"public\" )\n| where DestinationIpType == \"public\"\n| extend SentBytesinMB = ((SentBytes / 1024)/1024), ReceivedBytesinMB = ((ReceivedBytes / 1024)/1024)\n| summarize HourlyCount = count(), TimeGeneratedMax=arg_max(TimeGenerated, *), DestinationIPList=make_set(DestinationIP, 100), DestinationPortList = make_set(DestinationPort,100), TotalSentBytesinMB = sum(SentBytesinMB), TotalReceivedBytesinMB = sum(ReceivedBytesinMB) by SourceIP, DeviceVendor, TimeGeneratedHour=bin(TimeGenerated,1h)\n| where TotalSentBytesinMB > bytessentperhourthreshold\n| sort by TimeGeneratedHour asc, TotalSentBytesinMB desc\n| extend Rank=row_number(1, prev(TimeGeneratedHour) != TimeGeneratedHour) // Ranking the dataset per Hourly Partition\n| where Rank < 10 // Selecting Top 10 records with Highest BytesSent in each Hour\n| project DeviceVendor, TimeGeneratedHour, TimeGeneratedMax, SourceIP, DestinationIPList, DestinationPortList, TotalSentBytesinMB, TotalReceivedBytesinMB, Rank\n),\n(\nVMConnection\n| where isnotempty(DestinationIp) and isnotempty(SourceIp)\n| where TimeGenerated between (startofday(ago(starttime))..startofday(ago(endtime)))\n| extend DateHour = bin(TimeGenerated, 1h) // create a new column and round to hour\n| where DateHour in ((AnomalyHours)) //filter the dataset to only selected anomaly hours\n| extend SourceIP = SourceIp, DestinationIP = DestinationIp\n| extend DestinationIpType = iff(DestinationIp matches regex PrivateIPregex,\"private\" ,\"public\" )\n| where DestinationIpType == \"public\" | extend DeviceVendor = \"VMConnection\"\n| extend SentBytesinMB = ((BytesSent / 1024)/1024), ReceivedBytesinMB = ((BytesReceived / 1024)/1024)\n| summarize HourlyCount = count(),TimeGeneratedMax=arg_max(TimeGenerated, *), DestinationIPList=make_set(DestinationIP, 100), DestinationPortList = make_set(DestinationPort, 100), TotalSentBytesinMB = sum(SentBytesinMB),TotalReceivedBytesinMB = sum(ReceivedBytesinMB) by SourceIP, DeviceVendor, TimeGeneratedHour=bin(TimeGenerated,1h)\n| where TotalSentBytesinMB > bytessentperhourthreshold\n| sort by TimeGeneratedHour asc, TotalSentBytesinMB desc\n| extend Rank=row_number(1, prev(TimeGeneratedHour) != TimeGeneratedHour) // Ranking the dataset per Hourly Partition\n| where Rank < 10 // Selecting Top 10 records with Highest BytesSent in each Hour\n| project DeviceVendor, TimeGeneratedHour, TimeGeneratedMax, SourceIP, DestinationIPList, DestinationPortList, TotalSentBytesinMB, TotalReceivedBytesinMB, Rank\n)\n);\n// Join against base logs to retrive records associated with the hour of anomoly\nTimeSeriesAlerts\n| where TimeGenerated > ago(2d)\n| join (\n BaseLogs | extend AnomalyHour = TimeGeneratedHour\n) on DeviceVendor, AnomalyHour | sort by score desc\n| project DeviceVendor, AnomalyHour,TimeGeneratedMax, SourceIP, DestinationIPList, DestinationPortList, TotalSentBytesinMB, TotalReceivedBytesinMB, TotalBytesSentinMBperHour, baselinebytessentperHour, score, anomalies\n| summarize EventCount = count(), StartTimeUtc= min(TimeGeneratedMax), EndTimeUtc= max(TimeGeneratedMax), SourceIPMax= arg_max(SourceIP,*), TotalBytesSentinMB = sum(TotalSentBytesinMB), TotalBytesReceivedinMB = sum(TotalReceivedBytesinMB), SourceIPList = make_set(SourceIP, 100), DestinationIPList = make_set(DestinationIPList, 100) by AnomalyHour,TotalBytesSentinMBperHour, baselinebytessentperHour, score, anomalies\n| project DeviceVendor, AnomalyHour, StartTimeUtc, EndTimeUtc, SourceIPMax, SourceIPList, DestinationIPList, DestinationPortList, TotalBytesSentinMB, TotalBytesReceivedinMB, TotalBytesSentinMBperHour, baselinebytessentperHour, score, anomalies, EventCount\n| extend timestamp =EndTimeUtc, IPCustomEntity = SourceIPMax\n", "enabled": true } ] } ```This same JSON will work if I make the query something simple like
CommonSecurityLog | take 1
so it's definitely something with the query; however if I manually create a scheduled alert rule with the exact same query the validation succeeds and it creates the rule OK.