azurerm_monitor_data_collection_rule: Add information about output_stream to docs

hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager

https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs

Mozilla Public License 2.0

4.53k stars 4.61k forks source link

azurerm_monitor_data_collection_rule: Add information about output_stream to docs #21880

Open simaotwx opened 1 year ago

simaotwx commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues

Community Note

Please vote on this issue by adding a :thumbsup: reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

The resource azurerm_monitor_data_collection_rule documents for output_stream:

output_stream - (Optional) The output stream of the transform. Only required if the data flow changes data to a different stream.

It is not clear, what can be specified for output_stream. From the documentation of stream I can derive: streams - (Required) Specifies a list of streams. Possible values include but not limited to Microsoft-Event, Microsoft-InsightsMetrics, Microsoft-Perf, Microsoft-Syslog,and Microsoft-WindowsEvent.

because in the example, Microsoft-Syslog is used as output stream:

  data_flow {
    streams       = ["Custom-MyTableRawData"]
    destinations  = ["example-destination-log"]
    output_stream = "Microsoft-Syslog"
    transform_kql = "source | project TimeGenerated = Time, Computer, Message = AdditionalContext"
  }

Since the list given in the documentation is not complete, where can I find a complete list of possible output_streams? I know that there are also custom streams, so we should keep those aside.

For reference, I'm trying to do the following:

resource "azurerm_monitor_data_collection_rule" "log_collection_rule" {
  name                        = "${local.deployment_name}-log-collection-dcr"
  location                    = local.location
  resource_group_name         = local.rg_name
  data_collection_endpoint_id = azurerm_monitor_data_collection_endpoint.log_collection.id

  destinations {
    log_analytics {
      workspace_resource_id = azurerm_log_analytics_workspace.log_analytics_workspace.id
      name                  = "wordpress-logs"
    }
  }

  data_flow {
    streams      = ["Custom-RawMonologLogs"]
    destinations = ["wordpress-logs"]
  }

  data_sources {
    log_file {
      name          = "wordpress-logfiles"
      format        = "text"
      streams       = ["Custom-RawMonologLogs"]
      file_patterns = ["/var/local/opt/${local.deployment_name}/wordpress/volumes/logs/*.log"]
      settings {
        text {
          record_start_timestamp_format = "ISO 8601"
        }
      }
    }
  }

  stream_declaration {
    stream_name = "Custom-RawMonologLogs"
    column {
      name = "Time"
      type = "datetime"
    }
    column {
      name = "Level"
      type = "string"
    }
    column {
      name = "Logger"
      type = "string"
    }
    column {
      name = "Context"
      type = "string"
    }
    column {
      name = "Message"
      type = "string"
    }
    column {
      name = "AdditionalContext"
      type = "string"
    }
  }

  description = "Collection of logs from WordPress"
  tags        = local.default_tags
}

But I receive the error Status=400 Code="InvalidPayload" Message="Data collection rule is invalid" Details=[{"code":"InvalidOutputTable","message":"Table for output stream 'Custom-RawMonologLogs' is not available for destination 'wordpress-logs'.","target":"properties.dataFlows[0]"}]

This error lets me believe that I need to specify an output_stream but I don't know what should be specified in this case. The goal is for this to be visible in Log Analytics. So it would be great to have a list of possible values for output_stream.

New or Affected Resource(s)/Data Source(s)

azurerm_monitor_data_collection_rule

Potential Terraform Configuration

No response

References

No response

teowa commented 1 year ago

Hi @rcskosir , thanks for submitting this issue. The output_stream can be a list of built-in streams (starts with Microsoft-) and custom table (ends with _CL) in log analytics workspace, you can find more details in this doc

simaotwx commented 1 year ago

@teowa Thank you. That's a good starting point. It could be useful to add this information to the provider documentation.

dansmitt commented 1 year ago

@simaotwx did you get it run? I got the same error and did something like:

resource "azurerm_monitor_data_collection_rule" "dcr" {
  name                        = "example-dcr"
  resource_group_name         = module.rg.resource_group_name
  location                    = module.rg.resource_group_location
  data_collection_endpoint_id = azurerm_monitor_data_collection_endpoint.dce.id

  destinations {
    log_analytics {
      workspace_resource_id = azurerm_log_analytics_workspace.auditlogla.id
      name                  = "example-destination-log"
    }
  }

  data_flow {
    streams       = ["Custom-MyTableRawData"]
    destinations  = ["example-destination-log"]
    output_stream = "Custom-MyTable_CL"
    transform_kql = "source | project TimeGenerated = Time, Computer, Message = AdditionalContext"
  }

  stream_declaration {
    stream_name = "Custom-MyTableRawData"
    column {
      name = "Time"
      type = "datetime"
    }
    column {
      name = "Computer"
      type = "string"
    }
    column {
      name = "AdditionalContext"
      type = "string"
    }
  }

  depends_on = [
    azurerm_log_analytics_workspace.auditlogla
  ]
}

Error:

Service returned an error. Status=400 Code="InvalidPayload" Message="Data collection rule is invalid" Details=[{"code":"InvalidOutputTable","message":"Table for output stream 'Custom-MyTable_CL' is not available for destination 'example-destination-log'.","target":"properties.dataFlows[0]"}]

@teowa can you confirm that this implementation looks like it should be?

simaotwx commented 1 year ago

@dansmitt nope, I had to change output_stream to Microsoft-Syslog to get it to apply but that is not what I actually intended to do (and I haven't verified if it works). It seems like the stream specified in stream_declaration is not created before the data flow, thus causing the Azure API to not find the table and thus rejecting the data flow. This might actually be a bug in the provider. Maybe a separate resource for the table/stream declaration would be a good idea.

dansmitt commented 1 year ago

@simaotwx thought the same. Good that you run into the same problem.

simaotwx commented 1 year ago

This is what I currently have. It applies, but is not what I wanted:

resource "azurerm_monitor_data_collection_rule" "log_collection_rule" {
  name                        = "${local.deployment_name}-log-collection-dcr"
  location                    = local.location
  resource_group_name         = local.rg_name
  data_collection_endpoint_id = azurerm_monitor_data_collection_endpoint.log_collection.id

  destinations {
    log_analytics {
      workspace_resource_id = azurerm_log_analytics_workspace.log_analytics_workspace.id
      name                  = "wordpress-logs"
    }
  }

  data_flow {
    streams       = ["Custom-RawMonologLogs"]
    destinations  = ["wordpress-logs"]
    output_stream = "Microsoft-Syslog"
    transform_kql = "source | project TimeGenerated = Time, Level, Logger, Context, AdditionalContext, Message = Message"
  }

  data_sources {
    log_file {
      name          = "wordpress-logfiles"
      format        = "text"
      streams       = ["Custom-RawMonologLogs"]
      file_patterns = ["/var/local/opt/${local.deployment_name}/wordpress/volumes/logs/*.log"]
      settings {
        text {
          record_start_timestamp_format = "ISO 8601"
        }
      }
    }
  }

  stream_declaration {
    stream_name = "Custom-RawMonologLogs"
    column {
      name = "Time"
      type = "datetime"
    }
    column {
      name = "Level"
      type = "string"
    }
    column {
      name = "Logger"
      type = "string"
    }
    column {
      name = "Context"
      type = "string"
    }
    column {
      name = "Message"
      type = "string"
    }
    column {
      name = "AdditionalContext"
      type = "string"
    }
  }

  description = "Collection of logs from WordPress"
  tags        = local.default_tags
}

simaotwx commented 1 year ago

What I also noticed is that in record_start_timestamp_format you can only specify a few predefined formats, but in my case, the timestamp is ISO 8601 with +00:00 as timezone and surrounded by brackets [] so this obviously won't work. I am also not sure what the columns are doing exactly. It would be nice to be able to specify the format like Rust does it, for example [{timestamp}] {level} {logger} {context} {message} {additional_context} or maybe as regex.

dansmitt commented 1 year ago

@simaotwx I created a bug on this. Lets see what happens

teowa commented 1 year ago

The custom table in log analytics workspace must be created before DCR creation, please see https://github.com/hashicorp/terraform-provider-azurerm/issues/21897#issuecomment-1559014381 for detail.

simaotwx commented 1 year ago

Another thing that is not documented is what format the logs need to have to feed it to a log_file of format text. There is conflicting information which makes it unclear.

Example:

 log_file {
      name          = "example-datasource-logfile"
      format        = "text"
      streams       = ["Custom-MyTableRawData"]
      file_patterns = ["C:\\JavaLogs\\*.log"]
      settings {
        text {
          record_start_timestamp_format = "ISO 8601"
        }
      }
    }

It's unclear to me what the text format means and why only text is supported. AFAIK, the log ingestion API needs the log to be formatted as JSON and the tranform_kql parameter in data_flow seems to confirm this since it is processing structured data. On the other hand, there is the timestamp format setting which is very confusing because I'm not sure how this is parsed. Does the timestamp need to be prepended to each log line or how is this to be understood?

I tried setting all of this up but my JSON logs are not appearing in log analytics (syslog is appearing, so it's not a connection issue). There is no indication of errors and I'm not sure how to proceed with troubleshooting other than trial-and-error. I might just not know how all of this works, partly because of scattered documentation and partly because I just started working with log analytics very recently.

benhaspalace commented 9 months ago

To use the Log Ingestion API with a Log Analytics Workspace, what worked for me during deployment is to create a custom table during deployment with a name ending in _CL and in the Data Collection Rule deployment to set the output stream to this very same name ending in _CL but also adding Custom- to the start of the name.

These naming requirements and possibilities for custom log ingestion should also be publicly documented.

E.g.

resource logAnalyticsWorkspace 'Microsoft.OperationalInsights/workspaces@2021-06-01' = {
  name: name
  location: location
  properties: {
    sku: {
      name: sku
    }
    retentionInDays: retentionInDays
  }
}

resource LAWCustomLogTable 'Microsoft.OperationalInsights/workspaces/tables@2022-10-01' = {
  // The name should end with '_CL'
  name: 'MyTable_CL'
  parent: logAnalyticsWorkspace
  properties: {
    schema: {
      // The name of the schema should be the same as the table resource above
      name: 'MyTable_CL'
      columns: [
        {
          description: 'TimeGeneratedDescription'
          name: 'TimeGenerated'
          type: 'datetime'
        }
    }
  }
}

resource DataCollectionEndpoint 'Microsoft.Insights/dataCollectionEndpoints@2022-06-01' = {
  name: 'DataCollectionEndpoint'
  location: location
  properties: {
    configurationAccess: {}
    description: 'Data Collection Endpoint instance'
    logsIngestion: {}
    metricsIngestion: {}
    networkAcls: {
      publicNetworkAccess: 'Disabled'
    }
  }
}

resource DataCollectionRule 'Microsoft.Insights/dataCollectionRules@2022-06-01' = {
  name: 'DataCollectionRule'
  location: location
  identity: {
    type: 'SystemAssigned'
  }
  properties: {
    dataCollectionEndpointId: DataCollectionEndpoint.id
    description: 'Data Collection ruleinstance'
    destinations: {
      logAnalytics: [
        {
          name: workspaceName
          workspaceResourceId: workspaceResourceId
        }
      ]
    }
    dataFlows: [
      {
        destinations: [
          workspaceName
        ]
        // Reference Custom- stream below
        streams: ['Custom-Stream']
        // output stream name should both respect the DCR naming requirement of the Custom- prefix
        // followed by the Log analytics Workspace Name, which for custom tables, has a _CL postfix naming requirement
        outputStream: 'Custom-MyTable_CL'
        transformKql: 'source'
      }
    ]
    streamDeclarations: {
      // Name should start with 'Custom-'
      'Custom-Stream': {
        columns: [
          {
            name: 'TimeGenerated'
            type: 'datetime'
          }
        ]
      }
    }
  }
}

doka380 commented 5 months ago

Hi, as I found, the naming convention is very strict, so, if you're parsing e.g. text log, stream_declaration.stream_name MUST be named in the following way: Custom-Text-tablename while tablename MUST ends on _CL and be present in the LAW. You can name it Custom-Text-InData and TF even will create it, but in this case, when looking at portal, you will find it says something like to 'no sources registered for this DCR' in section 'Data sources'.

The full template, which looks working (not yet tested, but at least successfully configured and linked to VMs) is the following:

resource "azapi_resource" "data_collection_logs_table" {
  name      = "my_CL"
  parent_id = var.log_analytics_workspace_id
  type      = "Microsoft.OperationalInsights/workspaces/tables@2022-10-01"
  body = jsonencode(
    {
      "properties" : {
        "schema" : {
          "name" : "my_CL",
          "columns" : [
            {
              "name" : "TimeGenerated",
              "type" : "datetime",
              "description" : "The time at which the data was generated"
            },
            {
              "name" : "RawData",
              "type" : "string",
              "description" : "The log entry"
            }
          ]
        }
      }
    }
  )
}

resource "azurerm_monitor_data_collection_rule" "dcr" {
  name                        = "doka_test_01"
  resource_group_name         = var.rg_name
  location                    = var.location
  kind                        = "Linux"
  data_collection_endpoint_id = var.data_collection_endpoint_id
  #description                 = "data collection rule example"

  identity {
    type         = "SystemAssigned"
  }

  tags = {
    created_by = "doka@funlab.cc"
  }

  data_sources {
    log_file {
      name          = "my-log"
      format        = "text"
      streams        = ["Custom-Text-${azapi_resource.data_collection_logs_table.name}"]
      file_patterns = ["/var/log/my.log"]
      settings {
        text {
          record_start_timestamp_format = "ISO 8601"
        }
      }
    }
  }

  destinations {
    log_analytics {
      workspace_resource_id = var.log_analytics_workspace_id
      name                  = "law01"
    }
  }

  data_flow {
    streams        = ["Custom-Text-${azapi_resource.data_collection_logs_table.name}"]
    destinations  = ["law01"]
    output_stream = "Custom-${azapi_resource.data_collection_logs_table.name}"
    transform_kql = "source"
  }

  stream_declaration {
    ### !!! IMPORTANT !!!
    ### Every part here is essential. You simply cannot name it in another way :-)
    stream_name = "Custom-Text-${azapi_resource.data_collection_logs_table.name}"
    column {
      name = "TimeGenerated"
      type = "datetime"
    }
    column {
      name = "RawData"
      type = "string"
    }
  }

  depends_on = [
    azapi_resource.data_collection_logs_table
  ]
}

Later will add here whether it gather data.

gpkm1469 commented 4 months ago

Hi, I'm facing the same error of "Invalid Payload".

I'm not even sure for which value it is giving this error. Also, I need to pass json log file. Did anybody try with custom json logs. I could not find any doc/information related to it. Any help/support will be appreciated.

dansmitt commented 4 months ago

@gpkm1469 have a look to #21897 does this fit for you?

gpkm1469 commented 4 months ago

Hello @dansmitt , no actually. My code is here

resource "azapi_resource" "data_collection_logs_table" {
  name      = "DCR_Table_TC_Example_CL"
  parent_id = azurerm_log_analytics_workspace.example.id
  type      = "Microsoft.OperationalInsights/workspaces/tables@2022-10-01"
  schema_validation_enabled = false
  body = jsonencode(
    {
      "properties" : {
        "schema" : {
          "name" : "DCR_Table_TC_Example_CL",
          "columns" : [
            {
              "name" : "TimeGenerated",
              "type" : "datetime",
              "description" : "The time at which the data was generated"
            },
            {
              "name" : "RawData",
              "type" : "string",
              "description" : "From the logs file"
            },
            {
              "name" : "FilePath",
              "type" : "string",
              "description" : "File path"
            }
          ]
        },
        "retentionInDays" : 30,
        "totalRetentionInDays" : 30
      }
    }
  )
}

resource "azurerm_monitor_data_collection_rule" "example-dcr-terraform" {
  name                        = "example-dcr-terraform"
  resource_group_name         = module.example_rg.name
  location                    = module.example_rg.location
  data_collection_endpoint_id = azurerm_monitor_data_collection_endpoint.example-dce-terraform.id
  kind = "Linux"

  destinations {
    log_analytics {
      name                  = "example-destination-log"
      workspace_resource_id = azurerm_log_analytics_workspace.example.id
    }
  }

  data_sources {
    log_file {
      name          = "example-logfile"
      format        = "text"
      streams       = ["Custom-${azapi_resource.data_collection_logs_table.name}"]
      file_patterns = ["/var/log/vault_audit.log"] //This file contains logs in json format
      settings {
        text {
          record_start_timestamp_format = "ISO 8601"
        }
      }
    }
  }

  data_flow {
    streams       = ["Custom-Text-${azapi_resource.data_collection_logs_table.name}"]
    destinations  = ["example-destination-log"]
    output_stream = "Custom-${azapi_resource.data_collection_logs_table.name}"
    transform_kql = "source | project TimeGenerated = time, RawData = request"
  }

  stream_declaration {
    stream_name = "Custom-Text-${azapi_resource.data_collection_logs_table.name}"

    column {
      name = "TimeGenerated"
      type = "datetime"
    }
    column {
      name = "RawData"
      type = "string"
    }
    column {
      name = "FilePath"
      type = "string"
    }
  }

  depends_on = [
    azapi_resource.data_collection_logs_table
  ]

}

On applying the above code, I'm getting "invalid payload" for DCR, but also the query regarding correct syntax/code for custom json logs.

dansmitt commented 4 months ago

@gpkm1469 may you try something simple like this?

resource "azurerm_monitor_data_collection_endpoint" "dce" {
  name                        = "example-dce"
  resource_group_name         = "example_rg"
  location                    = module.rg.resource_group_location

  lifecycle {
    create_before_destroy = true
  }
}

resource "azapi_resource" "auditlogla_table" {
  name      = "AuditLog_CL"
  parent_id = azurerm_log_analytics_workspace.auditlogla.id
  type      = "Microsoft.OperationalInsights/workspaces/tables@2022-10-01"
  body = jsonencode(
    {
      "properties" : {
        "schema" : {
          "name" : "AuditLog_CL",
          "columns" : [
              {
                  "name": "appId",
                  "type": "string"
              },
              {
                  "name": "correlationId",
                  "type": "string"
              }
          ]
        }
      }
    }
  )
}

resource "azurerm_monitor_data_collection_rule" "dcr" {
  name                        = "example-dcr"
  resource_group_name         = "example_rg"
  location                    = module.rg.resource_group_location
  data_collection_endpoint_id = azurerm_monitor_data_collection_endpoint.dce.id

  destinations {
    log_analytics {
      workspace_resource_id = azurerm_log_analytics_workspace.auditlogla.id
      name                  = "destination-log"
    }
  }

  data_flow {
    streams       = ["Custom-AuditLog_CL"]
    destinations  = ["destination-log"]
    output_stream = "Custom-AuditLog_CL"
    transform_kql = "source | extend TimeGenerated = todatetime(timeStamp)\n\n"
  }

  stream_declaration {
    stream_name = "Custom-AuditLog_CL"
    column {
      name = "appId"
      type = "string"
    }
    column {
      name = "correlationId"
      type = "string"
    }
  }

  depends_on = [
    azurerm_log_analytics_workspace.auditlogla,
    azapi_resource.auditlogla_table
  ]
}

I remember that there were some undocumented strange naming conventions but not sure anymore. I'd try to start as simple as possible to find the gaps. It was very painful to find out which way worked.

gpkm1469 commented 4 months ago

@dansmitt Thanks! Let me try with it. Also, I have a query. In the code that you shared we are not passing the logs file path anywhere. How will it fetch the logs then.

dansmitt commented 4 months ago

@gpkm1469 I'm passing the logs through the Data Collection Endpoint. I mean its just a start I'd give a try. Then I'd try to modify step by step. The point is that it's not good documented and what I understood in other discussions, the API cannot be implemented so far by Terraform.