open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
257 stars 166 forks source link

Add security events to SIEM targets #1460

Open gberche-orange opened 3 days ago

gberche-orange commented 3 days ago

Area(s)

area:log

Is your change request related to a problem? Please describe.

Security information and event management (SIEM) commonly use CEF (Common Event Format) or LEEF (Log Event Extended Format) for formatting security related events.

The CEF specifications are available at https://raffy.ch/blog/wp-content/uploads/2007/06/CEF.pdf

Some introduction material at https://www.splunk.com/en_us/blog/learn/common-event-format-cef.html

Why is CEF important?

CEF is important because it provides a standardized format for logging security-related events, which makes it easier to integrate logs from different sources into a single system. This can be particularly useful for security information and event management (SIEM) solutions, which are designed to collect and analyze logs from multiple sources to detect and respond to security threats.

The LEEF specifications are available at https://www.ibm.com/docs/en/dsm?topic=leef-overview

Examples of events leveraging CEF or LEEF to target SIEM:

You can use the CEF and LEEF formats to export to SIEM systems general events, The table below shows SIEM systems and the corresponding formats of export.

SIEM system Format of export
QRadar LEEF
ArcSight CEF
Splunk CEF

LEEF (Log Event Extended Format)—A customized event format for IBM Security QRadar SIEM. QRadar can integrate, identify, and process LEEF events. LEEF events must use UTF-8 character encoding. You can find detailed information on LEEF protocol in IBM Knowledge Center. CEF (Common Event Format)—An open log management standard that improves the interoperability of security-related information from different security and network devices and applications. CEF enables you to use a common event log format so that data can easily be integrated and aggregated for analysis by an enterprise management system. CEF events must use UTF-8 character encoding.

CEF fields used by cloudfoundry security event logs > Cloud Controller logs security events in the [Common Event Format](https://kc.mcafee.com/resources/sites/MCAFEE/content/live/CORP_KNOWLEDGEBASE/78000/KB78712/en_US/CEF_White_Paper_20100722.pdf) (CEF). CEF specifies the following format for log entries: > > ``` > CEF:Version|Device Vendor|Device Product|Device Version|Signature ID|Name|Severity|Extension > ``` > > Entries in the Cloud Controller log use the following format: > > ``` > CEF:CEF_VERSION|cloud_foundry|cloud_controller_ng|CC_API_VERSION| > SIGNATURE_ID|NAME|SEVERITY|rt=TIMESTAMP suser=USERNAME suid=USER_GUID > cs1Label=userAuthenticationMechanism cs1=AUTH_MECHANISM > cs2Label=vcapRequestId cs2=VCAP_REQUEST_ID request=REQUEST > requestMethod=REQUEST_METHOD cs3Label=result cs3=RESULT > cs4Label=httpStatusCode cs4=HTTP_STATUS_CODE src=SOURCE_ADDRESS > dst=DESTINATION_ADDRESS cs5Label=xForwardedFor cs5=X_FORWARDED_FOR_HEADER > ``` > > See the following list for a description of the properties shown in the Cloud Controller log format: > > * `CEF_VERSION`: The version of CEF used in the logs. > * `CC_API_VERSION`: The current Cloud Controller API version. > * `SIGNATURE_ID`: The method and path of the request. For example, `GET /v2/app:GUID`. > * `NAME`: The same as `SIGNATURE_ID`. > * `SEVERITY`: An integer that reflects the importance of the event. > * `TIMESTAMP`: The number of milliseconds since the Unix epoch. > * `USERNAME`: The name of the user who originated the request. > * `USER_GUID`: The GUID of the user who originated the request. > * `AUTH_MECHANISM`: The user authentication mechanism. This can be `oauth-access-token`, `basic-auth`, or `no-auth`. > * `VCAP_REQUEST_ID`: The VCAP request ID of the request. > * `REQUEST`: The request path and parameters. For example, `/v2/info?MY-PARAM=VALUE`. > * `REQUEST_METHOD`. The method of the request. For example, `GET`. > * `RESULT`: The meaning of the HTTP status code of the response. For example, `success`. > * `HTTP_STATUS_CODE`. The HTTP status code of the response. For example, `200`. > * `SOURCE_ADDRESS`: The IP address of the client who originated the request. > * `DESTINATION_ADDRESS`: The IP address of the Cloud Controller VM. > * `X_FORWARDED_FOR_HEADER`: The contents of the X-Forwarded-For > header of the request. This is empty if the header is not present. > > ### Examples of log entries > > The following list provides several example requests with the corresponding Cloud Controller log entries. > > * An anonymous `GET` request: > > ``` > CEF:0|cloud_foundry|cloud_controller_ng|2.54.0|GET /v2/info|GET > /v2/info|0|rt=1460690037402 suser= suid= request=/v2/info > requestMethod=GET src=127.0.0.1 dst=<%=vars.example_ip_1%> > cs1Label=userAuthenticationMechanism cs1=no-auth cs2Label=vcapRequestId > cs2=c4bac383-7cc9-4d9f-b1c0-1iq8c0baa000 cs3Label=result cs3=success > cs4Label=httpStatusCode cs4=200 cs5Label=xForwardedFor > cs5=<%=vars.example_ip_2%> > ``` >

/CC @KarstenSchnitter

Security related CEF fields are not currently described in https://github.com/open-telemetry/semantic-conventions/blob/96f8bda9bab363cb01e2441820bc83a5dad15801/docs/attributes-registry/log.md nor in https://github.com/open-telemetry/semantic-conventions/blob/96f8bda9bab363cb01e2441820bc83a5dad15801/docs/attributes-registry/cloudfoundry.md

Describe the solution you'd like

New security event registry fields selecting relevant security fields from CEF and LEEF

Possibly cloudfoundry-specific extensions to CEF

Describe alternatives you've considered

No response

Additional context

No response

KarstenSchnitter commented 3 days ago

Thank you for providing this issue. I think this topic requires a thorough discussion.

I have access to the CEF logs of an entire Cloud Foundry installation and can provide additional examples from other components as well. The example you chose is essentially an access logs. There are existing attributes, e.g., in http.md. Note, that vcapRequestId and xForwardedFor are essentially HTTP headers x-vcap-request-id and x-forwarded-for.

Let me provide a different example to show the diversity of these logs. This one is from the vcap.agent running on all bosh vms, that comprise a Cloud Foundry installation. It is a DNS sync event for the Bosh deployment manager:

2024/10/09 20:06:49 CEF:0|CloudFoundry|BOSH|1|agent_api|sync_dns_with_signed_url|1|duser=director.32a3e9df-257d-4590-a6a7-2b5686d3cb2d.5ec737a2-28fb-4335-87f9-c30c314681b8.518a6cd0-9437-46cf-b5c8-d78143404c9f src=bosh.hcp-staging.internal spt=4222 shost=5ec737a2-28fb-4335-87f9-c30c314681b8

This event looks very different from your example. It has a different tag (CloudFoundry vs cloud_foundry) and very different extensions. Comparing just those two events, one might think, that the third entry cloud_controller_ng and bosh might be a service.name and the next field a service.version. But I would argue, that cloud_controller_ng and bosh are more likely a service.namespace. This becomes clearer from the syslog headers, that are coming with this message:

<7>1 2024-10-09T20:06:49.016608+00:00 10.0.232.2 vcap.agent 5679 - [instance@47450 director="bosh-hcp-staging" deployment="application-logging" group="kafka" az="z3" id="f68fe289-0841-4ac7-b43d-29d645b55801"] 2024/10/09 20:06:49 CEF:0|CloudFoundry|BOSH|1|agent_api|sync_dns_with_signed_url|1|duser=director.32a3e9df-257d-4590-a6a7-2b5686d3cb2d.5ec737a2-28fb-4335-87f9-c30c314681b8.518a6cd0-9437-46cf-b5c8-d78143404c9f src=bosh.hcp-staging.internal spt=4222 shost=5ec737a2-28fb-4335-87f9-c30c314681b8

Following the cloudfoundry resource attributes, we would map vcap.agent to attribute cloudfoundry.system.id and f68fe289-0841-4ac7-b43d-29d645b55801 to cloudfoundry.system.instance.id. We can use those fields fields as service.name and service.instance.id. But we already know, that the service actually is bosh-dns. This is an example, where we might want to diverge and use this as service.name instead.

I think, from the example above it already becomes clear, that there should be a general decision first, whether SIEM formats should always use there own attribute sets or utilize existing attributes. From a brief look on the specifications it seems to me, that a mixture of both might be useful. For OpenTelemetry pipelines it is necessary to understand, that these SIEM events exist not on there own but have additional context as shown by the syslog header above. They will be mapped to the appropriate attributes independent of the event content.