mitre / caldera

Automated Adversary Emulation Platform
https://caldera.mitre.org
Apache License 2.0
5.64k stars 1.07k forks source link

Facts created in "fact sources" do not support relationships? #2988

Open timbrigham-oc opened 5 months ago

timbrigham-oc commented 5 months ago

I think there might be a bug in the relationship mapping for imported facts.

My use case is attempting to emulate an adversary having an already known username and password. These are domain.user.name has_pasword of domain.user.password, with an edge relationship defined. I want to be able to eventually use these three values in a basic parser to validate the settings are correct source / edge / output mapping.

This occurs on v 5.0.0, when creating a new fact in a fact source and defining a relationship.

When I export the facts created by one of my operations and look at the IP to FQDN mapping I see remote.host.fqdn has_ip to remote.host.ip

Looking at the exported JSON in the report for IP addresses the unique value, edge, and target make sense for the "relationships" -

        "unique": "<app.objects.secondclass.c_fact.Fact object at 0x797cfabe60b0>has_ip<app.objects.secondclass.c_fact.Fact object at 0x797cfabe47f0>",
        "source": {
          "unique": "remote.host.fqdnmdns.mcast.net",
          "trait": "remote.host.fqdn",
          "name": "remote.host.fqdn",
          "value": "mdns.mcast.net",
...
        "edge": "has_ip",
        "target": {
          "unique": "remote.host.ip224.0.0.251",
          "trait": "remote.host.ip",
          "name": "remote.host.ip",
          "value": "224.0.0.251",
...

When I look at facts created in via the UI I'm getting the following, showing "none".

    "relationships": [
      {
        "unique": "<app.objects.secondclass.c_fact.Fact object at 0x797cfa53a4d0>has_password<app.objects.secondclass.c_fact.Fact object at 0x797cfa539270>",
        "source": {
          "unique": "domain.user.nameNone",
          "trait": "domain.user.name",
          "name": "domain.user.name",
          "value": null,
...
        "edge": "has_password",
        "target": {
          "unique": "domain.user.passwordNone",
          "trait": "domain.user.password",
          "name": "domain.user.password",
          "value": null,
...

I have recreated this a couple times, and seems to occur regardless of the order I create the two facts or the relationship statement, but I tend to test by creating domain.user.name, then domain.user.password then the relationship has_password.

How do I define a relationship in a fact source?

guillaume-duong-bib commented 5 months ago

I don't get the issue, can you post a screenshot of your fact source?

timbrigham-oc commented 5 months ago

@guillaume-duong-bib sure thing -

image

guillaume-duong-bib commented 5 months ago

Thanks. Both of the code blocks you posted are from the Operation Full Report, right?

timbrigham-oc commented 5 months ago

The prior code blocks were from the fact source definition page export function. I can run a new operation and export again, a couple minutes please.

timbrigham-oc commented 5 months ago

Ok, these are from the full operation report.

For reference, this came from operation "Engagement 2 Lateral Movement (5/24/2024, 9:03:54 AM)"

  "facts": [
    {
      "unique": "domain.user.nameMYACCOUNT",
      "trait": "domain.user.name",
      "name": "domain.user.name",
      "value": "MYACCOUNT",
...
      "links": [],
      "relationships": [],
...
      "origin_type": "SEEDED",
...
    {
      "unique": "domain.user.passwordMYPASSWORD",
      "trait": "domain.user.password",
      "name": "domain.user.password",
      "value": "MYPASSWORD",
...
      "links": [],
      "relationships": [],
...
      "origin_type": "SEEDED",
...
    {
      "unique": "remote.host.fqdnMYFQDN1",
      "trait": "remote.host.fqdn",
      "name": "remote.host.fqdn",
      "value": "MYFQDN1",
...
      "links": [],
      "relationships": [],
...
      "origin_type": "SEEDED",
    {
      "unique": "remote.host.fqdnMYFQDN2",
      "trait": "remote.host.fqdn",
      "name": "remote.host.fqdn",
      "value": "MYFQDN2",
...
      "links": [],
      "relationships": [],
...
      "origin_type": "SEEDED",

I also noticed something else that I'm not sure if it's intended behavior surrounding facts.

In one of my earlier engagements I realized I had a typo for remote.host.fqdn (I entered a one instead of a two). When I rerun the operation, I see both of those values listed as seeded values. I also don't see the seeded values listed in the created fact source output. I thought that imported facts would show there? That might have been on on version 4.x though.

image

image

guillaume-duong-bib commented 5 months ago

There is no data inside the relationship apart from the edge, source fact trait, and target fact trait. So if the facts are retrieved during the operation based on this, and with their up-to-date data, that would be no problem.

Except... it doesn't seem to be limited to the reports.

I tested creating a user.name has_password user.password relationship in a Fact Source vs. from an ability (link), and although the 4 facts are created, only the relationship from the link seems to be created. I tested that with a follow-up ability that looks for facts that have this relationship, and only the link's relationship is used. Plus, looking behind the scenes, here's what I have for my (saved to disk) relationships: image

The first ### is the one from the Fact Source, the second one is from the link. And what we see for the first is exactly what you reported.

All that to say, yeah there seems to be a bug. I don't have time right now, but next week I'll take a look starting from how the fact source is loaded in the operation. I assume the relationship part of the source loading is broken.

guillaume-duong-bib commented 5 months ago

Regarding the 2 other items:

In one of my earlier engagements I realized I had a typo for remote.host.fqdn (I entered a one instead of a two).
When I rerun the operation, I see both of those values listed as seeded values.

Yup, I noticed that one, see #2978 (item 1 & 2) and the linked issues. I'm waiting for some insight from the Caldera team to dedicate more work to it (by the way, if you have some opinion on these, please do give it on the pr :) )

I also don't see the seeded values listed in the created fact source output. I thought that imported facts would show there? That might have been on on version 4.x though. Looking at the code, this seems expected: only parsed facts are saved in the fact source. This also sounds logical to me. Can't speak for sure for what used to happen on 4.x, but I doubt this changed from 4 to 5 since it was mostly the vueJs part that got updated. So I think that one item can be ignored.

timbrigham-oc commented 5 months ago

I love how interactive this community has been in getting bug reports! Thanks @guillaume-duong-bib

Could you kindly explain how you got your screenshot for the "saved to disk" relationships? I would love to reproduce that, it would be a great debugging tool.

guillaume-duong-bib commented 5 months ago

I read the stores (data/fact_store and data/object_store) from an interactive Python shell.

image

Usually, I will stop the server before fiddling with that, to make sure the stores are fully up-to-date with what I can see graphically. Also, I recall disabling file encryption in app/service/file_svc/py, FileSvc._save but not too sure if the stores were affected by this (my Caldera test instance is... a bit of a mess). Anyway, if you can't read the stores, that's probably why. They key should be in the conf file though.


However, even after spending some time looking through them, I still don't understand perfectly the use of these stores vs the actual files. Unless I'm mistaken, there are basically 2 copies of most data at all times, in text files and in the "ram" (named as such in the code). That ram also gets saved to the disk as stores, which I assume is for performance reasons.

The synchronization between the two is something I haven't been able to completely get my head around for now... But I definitely intend on focusing on that some time this week or the next.

timbrigham-oc commented 5 months ago

Thanks @guillaume-duong-bib. I'll definitely look into doing reading that file.

I would dearly love to see detailed logging regarding when and how facts are used when turning the server process logging up to verbose. I'm thinking that the values should be shown, what it's compared against, and the result. Seems like an oversight to not have that included honestly. I wouldn't mind being the one put together a pull request to add that functionality if it's likely to be included into the code base.

Also since you are looking into the facts, I did notice another oddity. Literally every fact that gets created during an operation - when viewed in a fact source - shows as being imported. I'd assume that anything flagged with that 'IMP' was from another fact source originally, not created by the engagement that populated the fact source.

guillaume-duong-bib commented 5 months ago

Absolutely agreed on the logging, I've packed my local repo with prints everywhere to be able to monitor the checks. It may be a good idea to have a new level of verbose? Or just add it to DEBUG.

elegantmoose commented 4 months ago

@timbrigham-oc @guillaume-duong-bib We have had logging in the planning service before. Its a lot of data and can overwhelm the logging files so we would need to be judicious in what is logged and where.

RE the fact store, this is a know gap. We first created a simple database, ie the data service / object store. Then we wanted a better service to handle fact/knowledge needs of operations, ie fact sources and knowledge service. Then, we never got funding to be able to reconcile the two (which is a big effort).. and hence we have the issue of data duplicates and uncertainty as to whats in RAM or on disk. @guillaume-duong-bib and his team do have a few PRs in the works to make facts a little more robust, but going to always be a WIP unfortunately.

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days