ocsf / ocsf-schema

OCSF Schema
Apache License 2.0
617 stars 130 forks source link

Clarify what the domain attribute entails #1102

Open pagbabian-splunk opened 4 months ago

pagbabian-splunk commented 4 months ago

The URL object now includes a domain attribute in addition to the already present subdomain attribute. The examples for domain and subdomain are correct, but not sufficient to determine how a multi-part / multi-level domain is segmented. My assumption is that for a N level domain name, the left-most segment is the subdomain the right-most segment is the TLD (top level domain), and the domain attribute would include the TLD but not the subdomain.

Note that the suffix is not spelled out, and can be as simple as the TLD, but can also be multi-part, as within LDAP directories. We haven't attempted to separate the suffix.

In short, the current descriptions for subdomain and domain need to be generalized.

--- Post Network Call 6/5/24 domain likely should be a "See Specific Usage" attribute rather than have an absolute definition. It can be captured as an internal domain name, an AD Domain, an external DNS domain, and lacking a specific attribute, a FQDN. It's use in URL is how the discussion started, and its usage there is as a segment of the URL, as we have hostname subdomain path and scheme (protocol). Likely we would want a TLD (per another issue) and possibly a suffix (which can capture the right-most segments of a URL, or can be an LDAP suffix which can be similar but different). e.g. co.uk can be a suffix or in an LDAP directory sub1.example.co.uk can be a suffix, used for partitioning the directory.

pagbabian-splunk commented 4 months ago

Discussion in Slack captured below:

Dave McCormack (Cisco) 2:16 PM I missed the introduction of the subdomain attribute in 1.2. I've done some more reading and, unfortunately, I think that the descriptions we've used for both subdomain and domain are incorrect per RFC. RFC8499 ("DNS Terminology") from 2019 might be our best reference here. It supersedes some older RFCs and quotes definitions from 1987's RFC1034 which, interestingly, is still essentially normative after 37 years. First let's consider the domain attribute by looking at how the term domain name is defined by the RFC: An ordered list of one or more labels. That's all! This means that www and www.nytimes and www.nytimes.com are all domain names. The RFC then continues with a note which includes this statement: A domain name whose last label identifies a root of the graph is fully qualified. This means that www.nytimes.com and nytimes.com and com are all fully-qualified domain names (FQDNs). It's possible that I've missed it, but I don't see anywhere in the RFC where special significance is bestowed on the part of a domain name that is immediately below the TLD, e.g. nytimes.com. To my mind therefore, the RFC doesn't align with the description of our domain attribute which reads: The domain portion of the URL. For example: example.com in https://sub.example.com. Given the existence of long FQDNs like www.kings.cam.ac.uk, the absence of a crystal clear RFC-backed way to populate the domain attribute will lead to poor customer outcomes. It is inevitable that different security vendors will have different interpretations. I expect there will even be inconsistency between product teams and detection content teams within the same organisation. Let's turn now to the subdomain attribute. RFC8499 defines this by first quoting RFC1034 and then adding some commentary: "A domain is a subdomain of another domain if it is contained within that domain. This relationship can be tested by seeing if the subdomain's name ends with the containing domain's name." (Quoted from [RFC1034], Section 3.1) For example, in the host name "nnn.mmm.example.com", both "mmm.example.com" and "nnn.mmm.example.com" are subdomains of "example.com". Note that the comparisons here are done on whole labels; that is, "ooo.example.com" is not a subdomain of "oo.example.com". I don't really need the nytimes.com examples here because they've done the work for me. The conclusion however is that this RFC definition is not consistent with how our subdomain attribute is described: The subdomain portion of the URL. For example: sub in https://sub.example.com or sub2.sub1 in https://sub2.sub1.example.com. I think we need to correct these issues in time for 1.3: First we should re-consider if attributes that simply contain sub-strings of the canonical hostname attribute are truly useful or are just likely to create confusion, defects, misses, breaches, etc. If there is consensus that such sub-strings are useful for whatever reason (perhaps for convenience of search, content matching, etc) then I think this needs to be consistent with RFC terminology. One RFC-friendly option would be to have a subdomains attribute (note plural) which would be an array breaking out a hostname like "sub2.sub1.example.com" into ["sub2.sub1.example.com", "sub1.example.com", "example.com", "com"]. That's all I got! Monday 3rd June is a public holiday in Ireland so responses possibly delayed. (edited)

Paul Agbabian (Splunk) 1:30 PM Thanks Dave for the research. We have to first agree on what we want to represent, regardless of the RFC, and then how the URL object should represent the components of the URL. We do know there is a desire to break out the substring (there is another Issue to this effect for DNS Query. Currently, domain was not intended to be "an ordered list of labels" but rather closer to the examples we have been discussing. Next, the naming should be what 'most' people would expect. E.g. TLD is understood to be one of the standard names at the top of the DNS hierarchy. subdomain is a branch under a domain parent. Etc.

Daniel Stinson 3:34 PM The way Elastic Common Schema does this which is effective for security use cases is breaking domain information in their URL fields down into a few parts: domain - full domain value inside the event data subdomain - eTLD + 2 onward, all labels inside of the registered_domain top_level_domain - the eTLD is using the public suffix list to correctly handle com vs co.uk registered_domain - the eTLD + 1 that an organization/actor registers.