opencybersecurityalliance / kestrel-lang

Kestrel threat hunting language: building reusable, composable, and shareable huntflows across different data sources and threat intel.
Apache License 2.0
295 stars 49 forks source link

Need for another transformer #327

Closed leila-rashidi closed 1 year ago

leila-rashidi commented 1 year ago

I have developed two Kestrel analytics to detect lateral movement based on SCOs which have one network-traffic and one user-account objects. We need to pass all network-traffic objects as one variable and all user-account objects as another variable to Kestrel analytics. Then, the analytics needs to correlate the network-traffic and user-account objects to find the source, destination, and user involved in each authentication event.

Dr. Shu suggested to build a new transformer like TIMESTAMPED which captures all objects, and add id of the SCo to which the network-traffic or user-account object belongs as an attribute. We need something similar to TIMESTAMPED which adds id of the observed-data object instead of first_observed.

leila-rashidi commented 1 year ago

I have started to fix this issue. You can assign this issue to me.

subbyte commented 1 year ago

Thanks, @leila-rashidi , for creating the issue for the new feature!

Let me add some background: in building the lateral movement detection analytics, @leila-rashidi discovered/used a smart hack to pass in two Kestrel variables to an analytics and keep the relation between entities in them. In the blog post, @leila-rashidi wrote that the following Kestrel huntflow will result in perfectly aligned entity records between connections_t and users_t.

users=GET user-account FROM stixshifter://database WHERE [user-account:user_id != null]
connections=FIND network-traffic LINKED users
connections_t=TIMESTAMPED(connections)
users_t=TIMESTAMPED(users)

In the analytics, @leila-rashidi joined the two tables row by row without an external key. This works for the specific data @leila-rashidi is testing, plus the observation:

As we observed, this transformer returns objects in the same order with SCOs.

To make it more robust, we may want to provide some IDs in such variables passed to an analytics before they are join in the analytics. So the variables (tables) can be joined using an external key, i.e., the IDs, no matter the order of the entries in the tables are permuted or not.

One possible solution is to invent a new transformer function in Kestrel:

connections_obs = OBSERVED(connections)
users_obs = OBSERVED(users)

where the OBSERVED() function adds an additional attribute to the variables, i.e., the reference to the observation ID.

@pcoccoli what do you think about the solution? If reasonable, could you give some guidance how @leila-rashidi could implement the OBSERVED() function? I guess @leila-rashidi needs to search for TIMESTAMPED in current Kestrel code to see how to add a new transformer in the syntax besides TIMESTAMPED and write a function to realize it. In addition, not sure if we already have an API in firepit to get the information, or @leila-rashidi needs to create a small function in firepit for it.

pcoccoli commented 1 year ago

I think the idea could work, but I wonder if it's the best solution. If the analytic needs to operate on records with certain attributes, maybe we should provide a high-level, generalized way of providing that. The proposed solution still puts a lot of the burden on the analytics author.

leila-rashidi commented 1 year ago

Hi @pcoccoli, a better solution is extending the Kestrel in a way that we can apply the GET command to read all SCOs such that the nested objects are included in SCOs.

subbyte commented 1 year ago

We could view all SCOs and their references as a graph inside an observation. Currently Kestrel does not have a way to refer to an observation (a record/event or more than a record/event) or a subgraph of entities. We may need to include events as first-class citizen of variables #299---and potentially subgraphs in the long-term---so one can directly passing them to an analytics. This will not be limited to data in one observation, but any subgraph defined in a variable.

This is very far from now. We could have a simple update, e.g., OBSERVED(), for the near term, and keep the use case as an example for the type system upgrade.

leila-rashidi commented 1 year ago

Hi @subbyte, I have found the part of code related to the TIMESTAMPED, and the only remaining thing is extending firepit. After that, it is very easy to update Kestrel.

pcoccoli commented 1 year ago

Has this been addressed by ADDOBSID and/or RECORDS?

subbyte commented 1 year ago

Yes. This is (and will be) released in v1.7.0 and v1.7.2