Open simaotwx opened 1 year ago
Hi @rcskosir , thanks for submitting this issue. The output_stream
can be a list of built-in streams (starts with Microsoft-
) and custom table (ends with _CL
) in log analytics workspace, you can find more details in this doc
@teowa Thank you. That's a good starting point. It could be useful to add this information to the provider documentation.
@simaotwx did you get it run? I got the same error and did something like:
resource "azurerm_monitor_data_collection_rule" "dcr" {
name = "example-dcr"
resource_group_name = module.rg.resource_group_name
location = module.rg.resource_group_location
data_collection_endpoint_id = azurerm_monitor_data_collection_endpoint.dce.id
destinations {
log_analytics {
workspace_resource_id = azurerm_log_analytics_workspace.auditlogla.id
name = "example-destination-log"
}
}
data_flow {
streams = ["Custom-MyTableRawData"]
destinations = ["example-destination-log"]
output_stream = "Custom-MyTable_CL"
transform_kql = "source | project TimeGenerated = Time, Computer, Message = AdditionalContext"
}
stream_declaration {
stream_name = "Custom-MyTableRawData"
column {
name = "Time"
type = "datetime"
}
column {
name = "Computer"
type = "string"
}
column {
name = "AdditionalContext"
type = "string"
}
}
depends_on = [
azurerm_log_analytics_workspace.auditlogla
]
}
Error:
Service returned an error. Status=400 Code="InvalidPayload" Message="Data collection rule is invalid" Details=[{"code":"InvalidOutputTable","message":"Table for output stream 'Custom-MyTable_CL' is not available for destination 'example-destination-log'.","target":"properties.dataFlows[0]"}]
@teowa can you confirm that this implementation looks like it should be?
@dansmitt nope, I had to change output_stream to Microsoft-Syslog to get it to apply but that is not what I actually intended to do (and I haven't verified if it works). It seems like the stream specified in stream_declaration is not created before the data flow, thus causing the Azure API to not find the table and thus rejecting the data flow. This might actually be a bug in the provider. Maybe a separate resource for the table/stream declaration would be a good idea.
@simaotwx thought the same. Good that you run into the same problem.
This is what I currently have. It applies, but is not what I wanted:
resource "azurerm_monitor_data_collection_rule" "log_collection_rule" {
name = "${local.deployment_name}-log-collection-dcr"
location = local.location
resource_group_name = local.rg_name
data_collection_endpoint_id = azurerm_monitor_data_collection_endpoint.log_collection.id
destinations {
log_analytics {
workspace_resource_id = azurerm_log_analytics_workspace.log_analytics_workspace.id
name = "wordpress-logs"
}
}
data_flow {
streams = ["Custom-RawMonologLogs"]
destinations = ["wordpress-logs"]
output_stream = "Microsoft-Syslog"
transform_kql = "source | project TimeGenerated = Time, Level, Logger, Context, AdditionalContext, Message = Message"
}
data_sources {
log_file {
name = "wordpress-logfiles"
format = "text"
streams = ["Custom-RawMonologLogs"]
file_patterns = ["/var/local/opt/${local.deployment_name}/wordpress/volumes/logs/*.log"]
settings {
text {
record_start_timestamp_format = "ISO 8601"
}
}
}
}
stream_declaration {
stream_name = "Custom-RawMonologLogs"
column {
name = "Time"
type = "datetime"
}
column {
name = "Level"
type = "string"
}
column {
name = "Logger"
type = "string"
}
column {
name = "Context"
type = "string"
}
column {
name = "Message"
type = "string"
}
column {
name = "AdditionalContext"
type = "string"
}
}
description = "Collection of logs from WordPress"
tags = local.default_tags
}
What I also noticed is that in record_start_timestamp_format
you can only specify a few predefined formats, but in my case, the timestamp is ISO 8601 with +00:00 as timezone and surrounded by brackets []
so this obviously won't work. I am also not sure what the columns are doing exactly. It would be nice to be able to specify the format like Rust does it, for example [{timestamp}] {level} {logger} {context} {message} {additional_context}
or maybe as regex.
@simaotwx I created a bug on this. Lets see what happens
The custom table in log analytics workspace must be created before DCR creation, please see https://github.com/hashicorp/terraform-provider-azurerm/issues/21897#issuecomment-1559014381 for detail.
Another thing that is not documented is what format the logs need to have to feed it to a log_file
of format text
. There is conflicting information which makes it unclear.
Example:
log_file {
name = "example-datasource-logfile"
format = "text"
streams = ["Custom-MyTableRawData"]
file_patterns = ["C:\\JavaLogs\\*.log"]
settings {
text {
record_start_timestamp_format = "ISO 8601"
}
}
}
It's unclear to me what the text
format means and why only text
is supported.
AFAIK, the log ingestion API needs the log to be formatted as JSON and the tranform_kql
parameter in data_flow
seems to confirm this since it is processing structured data. On the other hand, there is the timestamp format setting which is very confusing because I'm not sure how this is parsed.
Does the timestamp need to be prepended to each log line or how is this to be understood?
I tried setting all of this up but my JSON logs are not appearing in log analytics (syslog is appearing, so it's not a connection issue). There is no indication of errors and I'm not sure how to proceed with troubleshooting other than trial-and-error. I might just not know how all of this works, partly because of scattered documentation and partly because I just started working with log analytics very recently.
To use the Log Ingestion API with a Log Analytics Workspace, what worked for me during deployment is to create a custom table during deployment with a name ending in _CL
and in the Data Collection Rule deployment to set the output stream to this very same name ending in _CL
but also adding Custom-
to the start of the name.
These naming requirements and possibilities for custom log ingestion should also be publicly documented.
E.g.
resource logAnalyticsWorkspace 'Microsoft.OperationalInsights/workspaces@2021-06-01' = {
name: name
location: location
properties: {
sku: {
name: sku
}
retentionInDays: retentionInDays
}
}
resource LAWCustomLogTable 'Microsoft.OperationalInsights/workspaces/tables@2022-10-01' = {
// The name should end with '_CL'
name: 'MyTable_CL'
parent: logAnalyticsWorkspace
properties: {
schema: {
// The name of the schema should be the same as the table resource above
name: 'MyTable_CL'
columns: [
{
description: 'TimeGeneratedDescription'
name: 'TimeGenerated'
type: 'datetime'
}
}
}
}
resource DataCollectionEndpoint 'Microsoft.Insights/dataCollectionEndpoints@2022-06-01' = {
name: 'DataCollectionEndpoint'
location: location
properties: {
configurationAccess: {}
description: 'Data Collection Endpoint instance'
logsIngestion: {}
metricsIngestion: {}
networkAcls: {
publicNetworkAccess: 'Disabled'
}
}
}
resource DataCollectionRule 'Microsoft.Insights/dataCollectionRules@2022-06-01' = {
name: 'DataCollectionRule'
location: location
identity: {
type: 'SystemAssigned'
}
properties: {
dataCollectionEndpointId: DataCollectionEndpoint.id
description: 'Data Collection ruleinstance'
destinations: {
logAnalytics: [
{
name: workspaceName
workspaceResourceId: workspaceResourceId
}
]
}
dataFlows: [
{
destinations: [
workspaceName
]
// Reference Custom- stream below
streams: ['Custom-Stream']
// output stream name should both respect the DCR naming requirement of the Custom- prefix
// followed by the Log analytics Workspace Name, which for custom tables, has a _CL postfix naming requirement
outputStream: 'Custom-MyTable_CL'
transformKql: 'source'
}
]
streamDeclarations: {
// Name should start with 'Custom-'
'Custom-Stream': {
columns: [
{
name: 'TimeGenerated'
type: 'datetime'
}
]
}
}
}
}
Hi, as I found, the naming convention is very strict, so, if you're parsing e.g. text log, stream_declaration.stream_name MUST be named in the following way: Custom-Text-tablename while tablename MUST ends on _CL and be present in the LAW. You can name it Custom-Text-InData and TF even will create it, but in this case, when looking at portal, you will find it says something like to 'no sources registered for this DCR' in section 'Data sources'.
The full template, which looks working (not yet tested, but at least successfully configured and linked to VMs) is the following:
resource "azapi_resource" "data_collection_logs_table" {
name = "my_CL"
parent_id = var.log_analytics_workspace_id
type = "Microsoft.OperationalInsights/workspaces/tables@2022-10-01"
body = jsonencode(
{
"properties" : {
"schema" : {
"name" : "my_CL",
"columns" : [
{
"name" : "TimeGenerated",
"type" : "datetime",
"description" : "The time at which the data was generated"
},
{
"name" : "RawData",
"type" : "string",
"description" : "The log entry"
}
]
}
}
}
)
}
resource "azurerm_monitor_data_collection_rule" "dcr" {
name = "doka_test_01"
resource_group_name = var.rg_name
location = var.location
kind = "Linux"
data_collection_endpoint_id = var.data_collection_endpoint_id
#description = "data collection rule example"
identity {
type = "SystemAssigned"
}
tags = {
created_by = "doka@funlab.cc"
}
data_sources {
log_file {
name = "my-log"
format = "text"
streams = ["Custom-Text-${azapi_resource.data_collection_logs_table.name}"]
file_patterns = ["/var/log/my.log"]
settings {
text {
record_start_timestamp_format = "ISO 8601"
}
}
}
}
destinations {
log_analytics {
workspace_resource_id = var.log_analytics_workspace_id
name = "law01"
}
}
data_flow {
streams = ["Custom-Text-${azapi_resource.data_collection_logs_table.name}"]
destinations = ["law01"]
output_stream = "Custom-${azapi_resource.data_collection_logs_table.name}"
transform_kql = "source"
}
stream_declaration {
### !!! IMPORTANT !!!
### Every part here is essential. You simply cannot name it in another way :-)
stream_name = "Custom-Text-${azapi_resource.data_collection_logs_table.name}"
column {
name = "TimeGenerated"
type = "datetime"
}
column {
name = "RawData"
type = "string"
}
}
depends_on = [
azapi_resource.data_collection_logs_table
]
}
Later will add here whether it gather data.
Hi, I'm facing the same error of "Invalid Payload".
I'm not even sure for which value it is giving this error. Also, I need to pass json log file. Did anybody try with custom json logs. I could not find any doc/information related to it. Any help/support will be appreciated.
@gpkm1469 have a look to #21897 does this fit for you?
Hello @dansmitt , no actually. My code is here
resource "azapi_resource" "data_collection_logs_table" {
name = "DCR_Table_TC_Example_CL"
parent_id = azurerm_log_analytics_workspace.example.id
type = "Microsoft.OperationalInsights/workspaces/tables@2022-10-01"
schema_validation_enabled = false
body = jsonencode(
{
"properties" : {
"schema" : {
"name" : "DCR_Table_TC_Example_CL",
"columns" : [
{
"name" : "TimeGenerated",
"type" : "datetime",
"description" : "The time at which the data was generated"
},
{
"name" : "RawData",
"type" : "string",
"description" : "From the logs file"
},
{
"name" : "FilePath",
"type" : "string",
"description" : "File path"
}
]
},
"retentionInDays" : 30,
"totalRetentionInDays" : 30
}
}
)
}
resource "azurerm_monitor_data_collection_rule" "example-dcr-terraform" {
name = "example-dcr-terraform"
resource_group_name = module.example_rg.name
location = module.example_rg.location
data_collection_endpoint_id = azurerm_monitor_data_collection_endpoint.example-dce-terraform.id
kind = "Linux"
destinations {
log_analytics {
name = "example-destination-log"
workspace_resource_id = azurerm_log_analytics_workspace.example.id
}
}
data_sources {
log_file {
name = "example-logfile"
format = "text"
streams = ["Custom-${azapi_resource.data_collection_logs_table.name}"]
file_patterns = ["/var/log/vault_audit.log"] //This file contains logs in json format
settings {
text {
record_start_timestamp_format = "ISO 8601"
}
}
}
}
data_flow {
streams = ["Custom-Text-${azapi_resource.data_collection_logs_table.name}"]
destinations = ["example-destination-log"]
output_stream = "Custom-${azapi_resource.data_collection_logs_table.name}"
transform_kql = "source | project TimeGenerated = time, RawData = request"
}
stream_declaration {
stream_name = "Custom-Text-${azapi_resource.data_collection_logs_table.name}"
column {
name = "TimeGenerated"
type = "datetime"
}
column {
name = "RawData"
type = "string"
}
column {
name = "FilePath"
type = "string"
}
}
depends_on = [
azapi_resource.data_collection_logs_table
]
}
On applying the above code, I'm getting "invalid payload" for DCR, but also the query regarding correct syntax/code for custom json logs.
@gpkm1469 may you try something simple like this?
resource "azurerm_monitor_data_collection_endpoint" "dce" {
name = "example-dce"
resource_group_name = "example_rg"
location = module.rg.resource_group_location
lifecycle {
create_before_destroy = true
}
}
resource "azapi_resource" "auditlogla_table" {
name = "AuditLog_CL"
parent_id = azurerm_log_analytics_workspace.auditlogla.id
type = "Microsoft.OperationalInsights/workspaces/tables@2022-10-01"
body = jsonencode(
{
"properties" : {
"schema" : {
"name" : "AuditLog_CL",
"columns" : [
{
"name": "appId",
"type": "string"
},
{
"name": "correlationId",
"type": "string"
}
]
}
}
}
)
}
resource "azurerm_monitor_data_collection_rule" "dcr" {
name = "example-dcr"
resource_group_name = "example_rg"
location = module.rg.resource_group_location
data_collection_endpoint_id = azurerm_monitor_data_collection_endpoint.dce.id
destinations {
log_analytics {
workspace_resource_id = azurerm_log_analytics_workspace.auditlogla.id
name = "destination-log"
}
}
data_flow {
streams = ["Custom-AuditLog_CL"]
destinations = ["destination-log"]
output_stream = "Custom-AuditLog_CL"
transform_kql = "source | extend TimeGenerated = todatetime(timeStamp)\n\n"
}
stream_declaration {
stream_name = "Custom-AuditLog_CL"
column {
name = "appId"
type = "string"
}
column {
name = "correlationId"
type = "string"
}
}
depends_on = [
azurerm_log_analytics_workspace.auditlogla,
azapi_resource.auditlogla_table
]
}
I remember that there were some undocumented strange naming conventions but not sure anymore. I'd try to start as simple as possible to find the gaps. It was very painful to find out which way worked.
@dansmitt Thanks! Let me try with it. Also, I have a query. In the code that you shared we are not passing the logs file path anywhere. How will it fetch the logs then.
@gpkm1469 I'm passing the logs through the Data Collection Endpoint. I mean its just a start I'd give a try. Then I'd try to modify step by step. The point is that it's not good documented and what I understood in other discussions, the API cannot be implemented so far by Terraform.
Is there an existing issue for this?
Community Note
Description
The resource
azurerm_monitor_data_collection_rule
documents foroutput_stream
:output_stream - (Optional) The output stream of the transform. Only required if the data flow changes data to a different stream.
It is not clear, what can be specified for
output_stream
. From the documentation ofstream
I can derive: streams - (Required) Specifies a list of streams. Possible values include but not limited to Microsoft-Event, Microsoft-InsightsMetrics, Microsoft-Perf, Microsoft-Syslog,and Microsoft-WindowsEvent.because in the example,
Microsoft-Syslog
is used as output stream:Since the list given in the documentation is not complete, where can I find a complete list of possible
output_stream
s? I know that there are also custom streams, so we should keep those aside.For reference, I'm trying to do the following:
But I receive the error
Status=400 Code="InvalidPayload" Message="Data collection rule is invalid" Details=[{"code":"InvalidOutputTable","message":"Table for output stream 'Custom-RawMonologLogs' is not available for destination 'wordpress-logs'.","target":"properties.dataFlows[0]"}]
This error lets me believe that I need to specify an
output_stream
but I don't know what should be specified in this case. The goal is for this to be visible in Log Analytics. So it would be great to have a list of possible values foroutput_stream
.New or Affected Resource(s)/Data Source(s)
azurerm_monitor_data_collection_rule
Potential Terraform Configuration
No response
References
No response