izar / pytm

A Pythonic framework for threat modeling
Other
891 stars 168 forks source link

DataFlow object attributes need work (or moves) #111

Open colesmj opened 3 years ago

colesmj commented 3 years ago

Many of the attributes defined for the DataFlow object belong elsewhere:

Correctly assigned:

Maybe correct:

Should be a property of the Source:

Should be a property of the Sink:

Should be a property of either source or sink:

colesmj commented 3 years ago

Just another related note on source in a DataFlow - that is also the "initiator" (probably) of a comm flow, which another important attribute to have available.

nineinchnick commented 3 years ago

isResponse is related to responseTo. First one is a boolean and second one is a reference to another dataflow. Having both is supposed to make creating simple models easier as you don't need to point to a specific response if it's unambiguous. Automatically assigning this reference is used when drawing the DFD and can be used in threat conditions.

response is similar to responseTo, it's the other end of the relation.

Since we mark dataflows as either requests or responses data should be unambiguous already.

Sources already have a port but its treated as a default value for any dataflow coming out of it if the dataflow doesn't specify srcPort. Remember that there can be multiple dataflows coming out of a single element and all could have different ports. Same applies to destination ports and I guess it's better to start with them, since most often source ports are chosen randomly.

isEncrypted on the dataflow means the whole connection, not just data. I agree it should be stated clearly but also it's a good idea to add this to the (new) Data class.

authenticatesDestination and checksDestinationRevocation are also already defined at sources and these are defaults for all outgoing dataflows. A single process can create connections to different services with totally different properties.

Other sink attributes you mentioned also are treated as defaults. As for usesVPN - I think this should be represented as other boundaries and more dataflows. I have no idea what implementsCommunicationProtocol represents.

nineinchnick commented 3 years ago

A minimum example:

client = Actor("client")
server = Server("server")

Dataflow(client, server, "request")
Dataflow(server, client, "response", isResponse=True)

a more verbose one:

client = Actor("client")
server = Server("server")

req = Dataflow(client, server, "request")
resp = Dataflow(server, client, "response", responseTo=req)

where the response property in the request will be assigned automatically.

Both examples will produce exactly same output.

colesmj commented 3 years ago

See Issue #35 for how implementsCommunicationProtocol is used (incorrectly) now...

colesmj commented 3 years ago

@nineinchnick Thanks for your feedback. Perhaps the response related items are properly used in DataFlow, but my other suggested changes I think are still valid, for the reasons I called out; and that the properties (such as authenticatesSource, except the usesVPN - I agree it is weird) is important to know on the server as an object, and would be independent of dataflow for any given port/process, but I'm willing to be convinced otherwise.

nineinchnick commented 3 years ago

I agree about usesLatestTLSversion, it should be converted to a min required TLS version attribute of an Element and Dataflows should define TLS version used and we should check this similar to data classification level in https://github.com/izar/pytm/blob/master/pytm/pytm.py#L1269

nineinchnick commented 3 years ago

authorizesSource is already in every asset like server and process, we should just remove it from the dataflow.

colesmj commented 3 years ago

Question on removing existing attrs: if a model attempts to set a non-existent attr, is the behavior to add it to the instance of the object? If so, then the rules that rely on DataFlow.usesLatestTLSversion should be changed to not use it, unless the matching (which is pair-wise) is not granular enough (i.e. if a pair includes source and dataflow, and dataflow and sink, then adding the attr (due to a legacy model) would result in a bad finding potentially, or dup at least).

nineinchnick commented 3 years ago

I was thinking of detecting when someone sets non-existing attributes to detect typos. This could be optional, similar to how we already detect duplicate dataflows.

Anyway, removals are always breaking changes so we need to tread carefully here. We haven't yet adopted semver officially but this would go into 2.0 if at all.

colesmj commented 3 years ago

This may become a problem moving forward, imho. I can foresee (or would like to move us in the direction of) supporting the addition of attributes that do not yet have detection in rules. This could be metadata or documentation within a model, or attrs for which we don't yet have threats, or that threat definition may vary by consumer of a single model (the creator of a model is not always the consumer of the model). Will want to discuss this more later.

colesmj commented 3 years ago

So interesting - there is no rule that looks for implementsCommunicationProtocol, so this is now an attribute without a use. But should we remove it? Perhaps. Or leave it for a future use case.

nineinchnick commented 3 years ago

I'm trying to address the tls thingy here: https://github.com/izar/pytm/pull/123 What's left in this issue?

nineinchnick commented 3 years ago

I also suggest removing authenticatedWith in #119

izar commented 3 years ago

There's no doubt that there is a lot of space for a semantic argument here of where any given attribute "belongs". I want to add to this discussion the dimension of "belongs according to whom?" . Nowadays, security experts and developers still see things differently. When adopting a strictly OO approach, it may make sense to move some attributes around. But consider, for example, usesVPN. Yes, the dataflow would inherit it from the source/sink. But then the modeler has the added work of making it true for both source and sink (does it make sense to have it true on only one side? that to me translates to ambiguity). On the other hand, in my mind it makes perfect sense for a given Dataflow to yell "hey, I need to go over a VPN!" (to be exact, i have been thinking about VPCs, but potatos. It is a requirement of the dataflow, not a "favor" done to it by the sink/source.

nineinchnick commented 3 years ago

For these reasons pytm should provide an unambiguous set of attrs and be easily extendable. Still, this is fuzzy, so just as few as possible. And if we don't have threats or docs for an attr it should not be included.

nineinchnick commented 3 years ago

We also already have isEncrypted which should be true if the dataflow goes over VPN. Encapsulation is just more assets and dataflows.