izar / pytm

A Pythonic framework for threat modeling
Other
876 stars 165 forks source link

Add feature to define findings manually in the model, similar to over… #180

Open nozmore opened 2 years ago

nozmore commented 2 years ago

…rides

The idea here is after working with a dev team and threats are identified which does not exist in the pytm threatlib they can be manually added to the model. Similar to using overrides to modify a threat/finding. Alternatively someone could use pytm only for model definition, dfd/seq generation and reporting using manual findings instead of using any of the threatlib.

I originally called these "custom findings" and named variables accordingly. I was thinking of renaming these to "manual findings", thoughts?

I would like to update tm.py and the README.md with a real world example that makes sense given the base tm model. I haven't spent any mental energy to come up with one that good and doesn't exist in the threatlib. If anyone has any thoughts let me know.

ghost commented 2 years ago

CodeSee Review Map:

Review these changes using an interactive CodeSee Map

Review in an interactive map

View more CodeSee Maps

Legend

CodeSee Map Legend

raphaelahrens commented 2 years ago

Hi,

I think the notation is a little bit redundant

web.custom_findings = [
    Finding(web,
        id = "foo_id_3",
        description = "foo_description",
        details = "foo_details",
        severity = "HIGH",
        mitigations = "foo_mitigations",
        example = "foo_example",
        threat_id = "foo_threat_id_3",
        references = "foo_references",
        response = "accepted",
    ),
]

could just be something like

f(
  web,
  "foo_description",
  details = "foo_details",
  severity = "HIGH",
  mitigations = "foo_mitigations",
  example = "foo_example",
  threat_id = "foo_threat_id_3",
  references = "foo_references",
  response = "accepted",
)

So a function or class constructor could take care of adding the new Finding to the custom/manual_finding of web. This removes the double use web and the list notation. It can also create an id for us if we don't want to specify one. It can also have required fields, like the description, directly as parameters.

If you prefer to have the element in the beginning of the line maybe

web.add_finding( ...

Regarding the name I tend towards manual, but I'm not sure if there should be a separation between manual and automatic findinds.

nozmore commented 2 years ago

Yes, I meant to mention that in the description, the redundancy is another reason I had this as draft. We haven't used methods for assignment in tm.py before so I just went with the same approach and obviously this doesn't work for json import. At minimum I was going to add logic to remove the dependency requirement.

I think the required fields are up to the report format, not sure if we should impose this. For an internal threat maybe from a custom threat library it may just be a pointer or reference to a ticket.

I was thinking instead of treating them as separate findings in the report I could add them to the main findings list and the findings instance variables but add a 'source' field to the Finding (default would be 'pytm', source could be 'manual' for these findings or a custom value set in the model. To the earlier point when I copy them from the custom_findings instance variable to findings I could assign the target element to remove the redundancy.

raphaelahrens commented 2 years ago

Source could also be an author, so that you have someone you can contact?

colesmj commented 2 years ago

A couple of comments on this thread:

It would potentially be useful to know the source of a finding as automatic (from pytm detection) or manual (human added) or possibly others like external or inherited.

Findings should have common fields such as summary, description (long or short), etc, and should probably also contain references or some field to support a contact url or email or something (maybe a link to a support or security team, etc)

To @nozmore 's comment about tying fields to the report, I'm a big fan of delinking. The findings should have metadata or data to be meaningful, the report can use what it wants/needs from findings, as long as there is a common set of fields and logic to detect when something is empty or missing.

nozmore commented 2 years ago

Source could also be an author, so that you have someone you can contact?

It would potentially be useful to know the source of a finding as automatic (from pytm detection) or manual (human added) or possibly others like external or inherited.

Exactly 'pytm' when found using the threatlib, 'manual' if the finding is defined in the model but 'source' is not set. any string (except for 'pytm') if 'source' is defined in the model.

colesmj commented 2 years ago

Source could also be an author, so that you have someone you can contact?

It would potentially be useful to know the source of a finding as automatic (from pytm detection) or manual (human added) or possibly others like external or inherited.

Exactly 'pytm' when found using the threatlib, 'manual' if the finding is defined in the model but 'source' is not set. any string (except for 'pytm') if 'source' is defined in the model.

Question: why should the finding(s) be defined in the model, and not with the model?

nozmore commented 2 years ago

Findings should have common fields such as summary, description (long or short), etc, and should probably also contain references or some field to support a contact url or email or something (maybe a link to a support or security team, etc)

To @nozmore 's comment about tying fields to the report, I'm a big fan of delinking. The findings should have metadata or data to be meaningful, the report can use what it wants/needs from findings, as long as there is a common set of fields and logic to detect when something is empty or missing.

I re-used the Finding class so they do have a common set of fields however for these manually identified findings I don't believe requiring fields is for us to say. For threatlib findings, yes, fully agree. For these I think the organization maintaining their report template would need to decide what fields are required. Currently target is required and should be, adding description is reasonable so there is a way to describe the finding or point to something.

I'm open to making whatever fields as required that we all agree to, I just don't think its necessary and will lead to people putting junk data like I did to satisfy the requirements of the tool.

colesmj commented 2 years ago

I'm open to making whatever fields as required that we all agree to, I just don't think its necessary and will lead to people putting junk data like I did to satisfy the requirements of the tool.

I'm not suggesting making fields mandatory unless they make sense - summary, description, source (how finding got into the list) certainly, and an identifier (auto generated or manually supplied); all other fields should be optional, and if possible not fixed. The report should not rely on fields existing otherwise those fields should always exist and have at least some form of default (even if an empty string), or the report generator should know how to handle missing/empty fields in findings.

nozmore commented 2 years ago

Ok. I think I incorporated all of the comments. Only thing I didn't was @colesmj comment about do the findings need to be stored in the model. No its code so they dont have to be, for now I like that it is part of the model. For me teams would manage 1 tm file per app (or per a given scope if a larger app) run this file thru the tool and produce output. Also there is a link to the model since the findings are added to elements. When I take a look at json support I would like to separate the input (model, manual findings) from the output (all findings)

Below are the comment I addressed..

The only outstanding change is to finalize on the custom findings in tm.py.... or decide to remove those from tm.py and just document them in the README.md.

raphaelahrens commented 2 years ago

Ok. I think I incorporated all of the comments. Only thing I didn't was @colesmj comment about do the findings need to be stored in the model.

@colesmj I also have to ask what you ment by "in" vs "with" the model.

When I take a look at json support I would like to separate the input (model, manual findings) from the output (all findings)

Why is this separation helpful? Could you give an example where this separation would make a difference?

I also wanted to state my preference for an extra function for manual findings. So the first thing I would do is add a function like this to my threat model.

def finding(element, description, **kwargs):
    element.manual_finding.append(Finding(.....

This function allows me to just create a finding without carrying about the structure of pytm. The model would then at the end have a bunch of lines like

tm = TM()
dB = Datastore()
broker = Process()
Dataflow(broker, dB, " messages")

finding(dB, "cyclic references" source="RA")
finding(broker, "missing topic authZ", source="RA")

The advantage is that the internal data model and the DSL is more decoupled. So as a user I also don't have to care what kind of finding this is and how it is stored.

The json representation is currently depending on the internal data structure and it might not be a good idea to have a user interface which is the datastructure.

colesmj commented 2 years ago

Ok. I think I incorporated all of the comments. Only thing I didn't was @colesmj comment about do the findings need to be stored in the model.

@colesmj I also have to ask what you ment by "in" vs "with" the model.

By writing findings directly into the model definition, changes may invalidate the findings or result in conflicting findings. Better IMHO to separate model definition from outcomes. But maybe we're not at that point in pytm structure to support separation.

nozmore commented 2 years ago

When I take a look at json support I would like to separate the input (model, manual findings) from the output (all findings) Why is this separation helpful? Could you give an example where this separation would make a difference?

I would like to have this running in the pipeline to create the report, dfd and findings. This way I can monitor changes to the model (or lack there of) and have them separate from the findings which could update due to pytm or threatlib changes while the model remains the same.

I also wanted to state my preference for an extra function for manual findings. So the first thing I would do is add a function like this to my threat model.

With moving away from the manual_findings attribute to use findings I was 90% there, just made some changes. It now supports findings added to the element's findings attribute or ones created using a lone Finding object with the element as an arg.

The json representation is currently depending on the internal data structure and it might not be a good idea to have a user interface which is the data structure.

I agree, I have some json change locally that I started, might start over but I do want to decouple the data models, definitely for json, but could be done for python stuff as well. Might fit well with @izar 's branch for modularizing pytm.

nineinchnick commented 2 years ago

I would like to have this running in the pipeline to create the report, dfd and findings. This way I can monitor changes to the model (or lack there of) and have them separate from the findings which could update due to pytm or threatlib changes while the model remains the same.

To avoid unexpected changes in threats, you should lock the version of pytm used or use your own threat library (a copy).