Update CTIA to use Relationship objects for all relations, and simplify API

craigbro commented 8 years ago

See https://github.com/threatgrid/ctim/issues/107 for the changes to the CTIM that we are adjusting too.

We want to radically simplify the model for interacting with the CTIA. That makes this a radical change to the CTIA. I think it is worth it in that it will eliminate alot of CTIA code, and make the act of relating two entities together consistent. Previously, adding a relationship to our graph, which would be a common operation, would require getting the object from the store, reading the list of values from the related_COA field, for example, adding our new value, and then PUTing it. All the time we have to hope noone else is editing the object.

With this model, adding a relationship is simply a POST of the Relationship object to the ctia/relationship endpoint.

It does complicate reading and traversing the data, because you now have to ask for the relationships an object has in a second query. I propose that the way to ease this complexity is to provide a GraphQL endpoint, which lets the UI compose their own views of the data according to a schema. Additionally, we can define one or more common, compound views of an entity and it's relationships that will populate common view patterns. For example, ctia/judgement/:id/views/expanded would return the base Judgement, as well as some minimal set of fields from up to 10 of it's Indicators.

Base Entity API

Each entity should have the following endpoints, which already exist for all the Base Entities

POST ENTITY/ - Create the Entity
PUT ENTITY/:ID - Update the Entity
GET ENTITY/:ID - Get the entity with that ID
GET ENTITY/external_id/:id - Get the entities with that external ID
GET ENTITY/search - Query String Search
DELETE ENTITY/:id - Delete the entity (if allowed for that entity)

Entity Views

To support the core views of entities that we will be using in the CTIA "CRUD" UI, we should provide a set of pre-defined views of an entity. These views will assemble data from the various objects it has relationships with, and perhaps some statistics, and return those in a well defined schema that can be used to populate the UI.

The contents of these views will be define by the UI/U team, and are specific to each entity.

GET ENTITY/:ID/views/listing

Returns the data need to populate the UI's "list" view of the entity.

GET ENTITY/:ID/views/expanded

Returns the data needed to populate the "expanded" view of the entity, which will include details from other objects it has relationships to, as well counts and first/last seen and other elements that may be needed.

GET ENTITY/:ID/views/historic

Returns the data needed to populate the historic view of the entity. May include edit history, a histogram of Sightings, or a first/last seen indicator.

GraphQL API endpoint

Ultimately, we would liek to give UI developers and integration partners a GraphQL API to our data model, since it provides them with a Schema, the ability to compose their own views as needed for their UI, and a consistent mechanism for traversing the graph in a way that doesn't require a avalanche of REST calls jumping from object to object by ID/URI.

Relationship API

Relationships are base entities, so they will actually have the same API above, minus the Views since they don't realy need them. Additionally, for relationships, updates cannot modify the target_ref, source_ref or relationship_type fields.

Observable API

Looking things up by observables is a basic operation in CTIA, and we need to preserve the existing APIs endpoints. However, we want to split the indicator helpers so that it's explicit whether we are looking for indicators that have been sighted, or indicators that judgements have been made against. It's possible for a sighting to relate to an indicator, but there not be an existing judgement for that observable that links to that indicator.

Judgments

GET observable/:type/:value/verdict
GET observable/:type/:value/judgements
GET observable/:type/:value/judgements/indicators

Sightings

GET observable/:type/:value/sightings
GET observable/:type/:value/sightings/indicators
GET observable/:type/:value/sightings/incidents

Comments

We don't currently store the type of the source or target ref, it may be worth doing so, so that we can search by the type of the target, not just the relationship_type
There are not helper endpoints for creating relationships, I figured it is best to just have one way to create them. Less confusion for users as to which is the proper way to do it, and less complexity in CTIA.
We need to decide if we want to preserve the existing endpoints so that iroh-enrich can execute it's queries quickly. Since it's our first real client, we should reviews it's use of the API and ensure we support it's efficient operation.
A GraphQL API would be complicated by the fact that we are a hypertext, where relationships can refer to objects on remote systems. The server cannot be expects to reach out to remote instances. How do we handle these situations? Perhaps we distinguish between local and remote relationships, and the GraphQL schema only works with local relations and objects?

oakmac commented 8 years ago

I prefer the idea of having the relationship be on the thing that it effects. ie: if a Judgement is supported by a handful of Indicators, then any time I want to do something with that Judgement (create it, update it, etc) I expect to have to deal with that relationship in that context.

This feels a bit like a leaky abstraction where you're asking the user of your API to create their own relationship table in SQL and do their own JOINs against it. While I can see how this would simplify CTIA code; it pushes that work to the API consumer of the CTIA.

In this scenario, would we ever envision managing Relationships in the UI directly? Or always in the context of the other Entities they are connected to? ie: Would there be a "Relationship management page"?

Anecdotally, it took me a while to really understand this proposal. In general I think of that as a negative indicator.

craigbro commented 8 years ago

Here is an example of old vs new:

Creating a TG Feed Judgement, the CURRENT way

Assume we have a TG Indicator

{
    "description": "Submitted Sample Modifying the Windows Hosts File",
    "tags": [],
    "valid_time": {
      "start_time": "2016-10-19T00:52:23.723Z",
      "end_time": "2525-01-01T00:00:00.000Z"
    },
    "producer": "ThreatGrid",
    "schema_version": "0.3.1",
    "type": "indicator",
    "created": "2016-10-19T00:52:23.723Z",
    "modified": "2016-10-19T00:52:23.723Z",
    "short_description": "ThreatGrid modified-hosts-dns Feed.",
    "title": "tg-feed-modified-hosts-dns",
    "id": "http://tenzin-beta.amp.cisco.com:80/ctia/indicator/indicator-139bd371-172d-43c2-9030-d3ff1ee8a5aa",
    "tlp": "green",
    "confidence": "High",
    "owner": "Unknown"
  },

If we want to add a Judgement, we would POST the following to /ctia/judgement:

{
    "valid_time": {
      "start_time": "2016-09-03T23:57:47.000Z",
      "end_time": "2016-10-03T23:57:47.000Z"
    },
    "observable": {
      "value": "48.ns4000wip.com",
      "type": "domain"
    },
    "reason_uri": "https://panacea.threatgrid.com/feeds/modified-hosts-dns/samples/84732d6ccaaabcd8ecaf25476c703b82",
    "indicators": [
      {
        "indicator_id": "http://tenzin-beta.amp.cisco.com:80/ctia/indicator/indicator-139bd371-172d-43c2-9030-d3ff1ee8a5aa"
      }
    ],
    "source": "Threat Grid modified-hosts-dns feed",
    "disposition": 2,
    "reason": "Submitted Sample Modifying the Windows Hosts File",
    "source_uri": "https://panacea.threatgrid.com/feeds/modified-hosts-dns/domains/48.ns4000wip.com",
    "disposition_name": "Malicious",
    "priority": 90,
    "severity": 100,
    "tlp": "green",
    "confidence": "High",
    }

We submit a single object, and we include the indicator id in the object.

Creating a TG Feed Judgement, the PROPOSED way

Assuming the same indicator ID, I would make TWO calls. First I would post the following to /ctia/judgement:

{
    "valid_time": {
      "start_time": "2016-09-03T23:57:47.000Z",
      "end_time": "2016-10-03T23:57:47.000Z"
    },
    "observable": {
      "value": "48.ns4000wip.com",
      "type": "domain"
    },
    "reason_uri": "https://panacea.threatgrid.com/feeds/modified-hosts-dns/samples/84732d6ccaaabcd8ecaf25476c703b82",
    "source": "Threat Grid modified-hosts-dns feed",
    "disposition": 2,
    "reason": "Submitted Sample Modifying the Windows Hosts File",
    "source_uri": "https://panacea.threatgrid.com/feeds/modified-hosts-dns/domains/48.ns4000wip.com",
    "disposition_name": "Malicious",
    "priority": 90,
    "severity": 100,
    "tlp": "green",
    "confidence": "High",
    }

Note that I do NOT include the indicators field at all. That field is no longer part of the Judgement object. I would get back the ID of the judgement, and have to remember that.

I would then also post the following to ctia/relationships:

{"source_ref": "IDOFMYJUDGEMENT",
  "target_ref": "http://tenzin-beta.amp.cisco.com:80/ctia/indicator/indicator-139bd371-172d-43c2-9030-d3ff1ee8a5aa",
  "relationship_type": "observable-of"
}

polygloton commented 8 years ago

I think these are great ideas.

gbuisson commented 8 years ago

I'm ok with all of it except for the graphQL API endpoint, I think that the effort/reward ratio for this might not be interesting enough.

saintx commented 8 years ago

First, I like the proposed refactor and promotion of the Relationship to a top-level entity. Right now, it is possible to POST multiple different entities with one HTTP Request, by using the /bulk route. It is true that this would incur a larger cost on reads, because clients will have to make multiple HTTP requests to traverse the graph.

Adding a GraphQL capability has a much bigger impact than simplifying this additional graph traversal cost, however. Over time, as our API matures and changes, we are going to have to commit to backwards compatibility for different versions of the API. If we ship minor changes every 2 weeks and guarantee backwards compatibility support for 2 years, then we will have to maintain up to 26 different versions of the REST API at a time. My guess is that with our continuous integration goals, clients to our REST APIs in Tenzin would break constantly, and cause quite a lot of pain for client consumers of our data.

A GraphQL endpoint actually can help us smooth over that issue for client consumers of our data, as well as dramatically lessen the amount of data we send over the wire. I'm all for it.

oakmac commented 8 years ago

After some thought, I'm coming around to this idea more because it simplifies the data model.

My biggest concern with the original proposal was increased UI complexity from having to "manually join" objects together into useful representations and increased number of HTTP requests.

Additionally, we can define one or more common, compound views of an entity and it's relationships that will populate common view patterns.

Strongly agree. Sounds like a practical approach.

I'm indifferent about the GraphQL thing at this stage in the product. GraphQL effectively supports infinite "build your own query" at the cost of a more complex abstraction level (both on client and server). This works well when you have multiple UI clients with varying needs of the same data. Google, Facebook, Netflix, etc. have this problem (same data set, UIs on multiple platforms).

pxninja commented 8 years ago

Reading through everything, I like the idea of being able to more easily grab information. Having said that, I will defer to the judgement of those more qualified to make an assessment. :)

nrezvani commented 8 years ago

I think it is a great idea to simplify the model for interacting with the CTIA. I also like the idea of having a "Relationship" entity; splitting the indicator helpers for the mentioned reasons; and having an "expanded" view of the entity. Eventually, these will all make my life more interesting and colorful I think. However, I am not familiar with GraphQL and cannot comment on that.

threatgrid / ctia