izar / pytm

A Pythonic framework for threat modeling
Other
861 stars 161 forks source link

Cannot override findings, threats remain, DFD impacted, exception thrown for overrides len > 1 #222

Closed MichaelMcAleer closed 3 weeks ago

MichaelMcAleer commented 8 months ago

Using the included tm.py as an example here, not the code snippet from the README under 'creating a threat model'.

When I use tm.py to build an example report and DFD everything is fine. When I try to add a single override as demonstrated in the README section overrides the threat is still listed under the dataflow for user_to_web, it is not overridden.

The DFD generated from this threat model now has an invalid element because the Finding does not realise what element is referring to when it is initialised. Given the Finding is listed in the overrides for the dataflow it should know what element it is referring to.

dfd

In addition to this, if an attempt is made to list more than one Finding in overrides, which expects a list of Findings in varFindings(), an exception is thrown:

❯ ./tm.py --report docs/advanced_template.md | pandoc -f markdown -t html > tm/report.html
Traceback (most recent call last):
  File "/Users/dev_user/PythonLibs/pytm/./tm.py", line 154, in <module>
    tm.process()
  File "/Users/dev_user/PythonLibs/pytm/pytm/pytm.py", line 1030, in process
    self.check()
  File "/Users/dev_user/PythonLibs/pytm/pytm/pytm.py", line 839, in check
    raise ValueError(
ValueError: Finding  have more than one override in Dataflow(User enters comments (*))

The example provided in the README uses threat SID INP02 which isn't a threat identified for that dataflow, I have used DE01 on its own to get the result above with invalid element in the DFD and remains as an active threat in the generated report.

When overriding multiple threats I used threats identified in the report to ensure relevance and uniqueness. The user_to_web overrides is as follows in my altered tm.py:

user_to_web = Dataflow(user, web, "User enters comments (*)")
user_to_web.protocol = "HTTP"
user_to_web.dstPort = 80
user_to_web.data = comments_in_text
user_to_web.note = "This is a simple web app\nthat stores and retrieves user comments."
user_to_web.overrides = [
    Finding(
        id="DE01",
        CVSS="9.1",
        response="""**To Mitigate**: run a memory sanitizer to validate the binary""",
    ),
    Finding(
        id="AC05",
        CVSS="9.2",
        response="""**To Mitigate**: run a memory sanitizer to validate the binary""",
    ),
    Finding(
        id="DE03",
        CVSS="9.3",
        response="""**To Mitigate**: run a memory sanitizer to validate the binary""",
    ),
    Finding(
        id="CR06",
        CVSS="9.4",
        response="""**To Mitigate**: run a memory sanitizer to validate the binary""",
    )
]
raphaelahrens commented 8 months ago

Hi,

I looked at the code and a Finding requires a corresponding threat. So either you will have to give it a threat_id or a threat in the key_arguments https://github.com/izar/pytm/blob/5ec33e15cb263504f0f8126d745e7f0b8c7dc8f6/pytm/pytm.py#L677-L684.

The error you encountered happens because later pytm checks if two Findinds cover the same threat_id https://github.com/izar/pytm/blob/5ec33e15cb263504f0f8126d745e7f0b8c7dc8f6/pytm/pytm.py#L834-L841 Since all your Findings have no associated threat the threat_id is None for all Findings and so the check complains. A quick bypass is to set the threat_id to a random string.

    Finding(
        id="DE01",
        CVSS="9.1",
        response="""**To Mitigate**: run a memory sanitizer to validate the binary""",
        threat_id="loL"
    ),
    Finding(
        id="AC05",
        CVSS="9.2",
        response="""**To Mitigate**: run a memory sanitizer to validate the binary""",
        threat_id="loL2 "
    ),
    Finding(
        id="DE03",
        CVSS="9.3",
        response="""**To Mitigate**: run a memory sanitizer to validate the binary""",
        threat_id="loL3"
    ),
    Finding(
        id="CR06",
        CVSS="9.4",
        response="""**To Mitigate**: run a memory sanitizer to validate the binary""",
        threat_id="loL4"
    )
]

@izar from the code I currently don't see why there is this check for the threat_id. Maybe you can shed some light on the reasoning.

izar commented 8 months ago

Hm, let me take a look and get back to you ASAP. Thanks for the report!

MichaelMcAleer commented 8 months ago

@raphaelahrens I again with Findings that had threat_id corresponding to the SID in threats.json, the Finding id was set as the number of the threat as listed in the generated report; 67-70:

user_to_web = Dataflow(user, web, "User enters comments (*)")
user_to_web.protocol = "HTTP"
user_to_web.dstPort = 80
user_to_web.data = comments_in_text
user_to_web.note = "This is a simple web app\nthat stores and retrieves user comments."
user_to_web.overrides = [
    Finding(
        id="67",
        threat_id="DE01",
        CVSS="9.3",
        response="""**To Mitigate**: run a memory sanitizer to validate the binary""",
    ),
    Finding(
        id="68",
        threat_id="AC05",
        CVSS="9.3",
        response="""**To Mitigate**: run a memory sanitizer to validate the binary""",
    ),
    Finding(
        id="69",
        threat_id="DE03",
        CVSS="9.3",
        response="""**To Mitigate**: run a memory sanitizer to validate the binary""",
    ),
    Finding(
        id="70",
        threat_id="CR06",
        CVSS="9.3",
        response="""**To Mitigate**: run a memory sanitizer to validate the binary""",
    )
]

Thankfully this did process fine, no exception returned about Finding having more than one override as detailed in the opening issue description.

Checked the DFD and its still has invalid elements, one for each Finding:

dfd

From what I can tell the invalid setting is coming from the below section of code because there is no arg passed for the Finding as demonstrated in the README so Element is just set to "invalid"

https://github.com/izar/pytm/blob/master/pytm/pytm.py#L661-664

izar commented 8 months ago

Can you share your full model ? I am a bit thrown by the "invalid" elements out there.

MichaelMcAleer commented 8 months ago

Sure no problem, to keep things simple I have used only the example provided tm.py with the override snippet taken from your README here.

Note: user_to_web.overrides findings have been changed after input above from @raphaelahrens, idis set which is the finding ID, and threat_id which is the associated threat SID to override, this is not clear in the README.

tm.py

#!/usr/bin/env python3

from pytm import (
    TM,
    Actor,
    Boundary,
    Classification,
    Data,
    Dataflow,
    Datastore,
    Finding,
    Lambda,
    Server,
    DatastoreType,
)

tm = TM("my test tm")
tm.description = "This is a sample threat model of a very simple system - a web-based comment system. The user enters comments and these are added to a database and displayed back to the user. The thought is that it is, though simple, a complete enough example to express meaningful threats."
tm.isOrdered = True
tm.mergeResponses = True
tm.assumptions = [
    "Here you can document a list of assumptions about the system",
]

internet = Boundary("Internet")

server_db = Boundary("Server/DB")
server_db.levels = [2]

vpc = Boundary("AWS VPC")

user = Actor("User")
user.inBoundary = internet
user.levels = [2]

web = Server("Web Server")
web.OS = "Ubuntu"
web.controls.isHardened = True
web.controls.sanitizesInput = False
web.controls.encodesOutput = True
web.controls.authorizesSource = False
web.sourceFiles = ["pytm/json.py", "docs/template.md"]

db = Datastore("SQL Database")
db.OS = "CentOS"
db.controls.isHardened = False
db.inBoundary = server_db
db.type = DatastoreType.SQL
db.inScope = True
db.maxClassification = Classification.RESTRICTED
db.levels = [2]

secretDb = Datastore("Real Identity Database")
secretDb.OS = "CentOS"
secretDb.sourceFiles = ["pytm/pytm.py"]
secretDb.controls.isHardened = True
secretDb.inBoundary = server_db
secretDb.type = DatastoreType.SQL
secretDb.inScope = True
secretDb.storesPII = True
secretDb.maxClassification = Classification.TOP_SECRET

my_lambda = Lambda("AWS Lambda")
my_lambda.controls.hasAccessControl = True
my_lambda.inBoundary = vpc
my_lambda.levels = [1, 2]

token_user_identity = Data(
    "Token verifying user identity", classification=Classification.SECRET
)
db_to_secretDb = Dataflow(db, secretDb, "Database verify real user identity")
db_to_secretDb.protocol = "RDA-TCP"
db_to_secretDb.dstPort = 40234
db_to_secretDb.data = token_user_identity
db_to_secretDb.note = "Verifying that the user is who they say they are."
db_to_secretDb.maxClassification = Classification.SECRET

comments_in_text = Data(
    "Comments in HTML or Markdown", classification=Classification.PUBLIC
)
user_to_web = Dataflow(user, web, "User enters comments (*)")
user_to_web.protocol = "HTTP"
user_to_web.dstPort = 80
user_to_web.data = comments_in_text
user_to_web.note = "This is a simple web app\nthat stores and retrieves user comments."
user_to_web.overrides = [
    Finding(
        id="67",
        threat_id="DE01",
        CVSS="9.3",
        response="""**To Mitigate**: run a memory sanitizer to validate the binary""",
    ),
    Finding(
        id="68",
        threat_id="AC05",
        CVSS="9.3",
        response="""**To Mitigate**: run a memory sanitizer to validate the binary""",
    ),
    Finding(
        id="69",
        threat_id="DE03",
        CVSS="9.3",
        response="""**To Mitigate**: run a memory sanitizer to validate the binary""",
    ),
    Finding(
        id="70",
        threat_id="CR06",
        CVSS="9.3",
        response="""**To Mitigate**: run a memory sanitizer to validate the binary""",
    )
]

query_insert = Data("Insert query with comments", classification=Classification.PUBLIC)
web_to_db = Dataflow(web, db, "Insert query with comments")
web_to_db.protocol = "MySQL"
web_to_db.dstPort = 3306
web_to_db.data = query_insert
web_to_db.note = (
    "Web server inserts user comments\ninto it's SQL query and stores them in the DB."
)

comment_retrieved = Data(
    "Web server retrieves comments from DB", classification=Classification.PUBLIC
)
db_to_web = Dataflow(db, web, "Retrieve comments")
db_to_web.protocol = "MySQL"
db_to_web.dstPort = 80
db_to_web.data = comment_retrieved
db_to_web.responseTo = web_to_db

comment_to_show = Data(
    "Web server shows comments to the end user", classifcation=Classification.PUBLIC
)
web_to_user = Dataflow(web, user, "Show comments (*)")
web_to_user.protocol = "HTTP"
web_to_user.data = comment_to_show
web_to_user.responseTo = user_to_web

clear_op = Data("Serverless function clears DB", classification=Classification.PUBLIC)
my_lambda_to_db = Dataflow(my_lambda, db, "Serverless function periodically cleans DB")
my_lambda_to_db.protocol = "MySQL"
my_lambda_to_db.dstPort = 3306
my_lambda_to_db.data = clear_op

userIdToken = Data(
    name="User ID Token",
    description="Some unique token that represents the user real data in the secret database",
    classification=Classification.TOP_SECRET,
    traverses=[user_to_web, db_to_secretDb],
    processedBy=[db, secretDb],
)

if __name__ == "__main__":
    tm.process()

Report and associated DFD are generated using commands from README here:

./tm.py --report docs/advanced_template.md | pandoc -f markdown -t html > tm/report.html
./tm.py --dfd | dot -Tpng -o tm/dfd.png

Using these commands and tm() from the snippet the generated report does not have threats overridden as expected on the user_to_web dataflow and the DFD has the invalid elements present.

izar commented 8 months ago

Ok, so this is functionality that I have not used before. If I understand @nineinchnick test code correctly, it is supposed to work on user-supplied Threat objects. I see no reason why it shouldn't work on library based threats, so I'll pursue that together with the spurious element appearing in the DFD. I think I know where that one is coming from.

Please stay tuned.

izar commented 8 months ago

I found the DFD problem. Can you submit a PR with your change, and I'll fix the DFD on my side?

On Tue, Oct 24, 2023 at 11:19 AM Michael McAleer @.***> wrote:

Quick update, I was able to get override threats by changing this condition:

https://github.com/izar/pytm/blob/5ec33e15cb263504f0f8126d745e7f0b8c7dc8f6/pytm/pytm.py#L804-L806

To:

if not t.apply(e) or t.id in override_ids: continue

This thought behind this is if the threat does not apply() or the threat id is in the list of override ids, we can continue to the next threat in the threat list. Using and here means that if a threat does apply() it doesn't matter if it is in the override list, its processed as a new finding anyway and override ignored.

This does not fix the invalid elements in the DFD however, that issue persists.

— Reply to this email directly, view it on GitHub https://github.com/izar/pytm/issues/222#issuecomment-1777464559, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC2BAITOJHWMH3JVB7JXV3YA7MADAVCNFSM6AAAAAA6L5TQTWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZXGQ3DINJVHE . You are receiving this because you were mentioned.Message ID: @.***>

izar commented 8 months ago

@MichaelMcAleer your change makes the test for the feature fail. I think we may need @nineinchnick input here, lest we break the intent of the feature.

MichaelMcAleer commented 8 months ago

Thanks for the update @izar, I was just working on it there and adding a way to track overridden threats with finding responses, they would be useful to output in generated report. I'll wait for input from @nineinchnick before continuing.

Whilst there is a pull request open with a fix incoming, I spotted something else in pytm.py that should be addressed:

https://github.com/izar/pytm/blob/5ec33e15cb263504f0f8126d745e7f0b8c7dc8f6/pytm/pytm.py#L1877-L1881

It should be:

    if type(obj) is not list:
        raise ValueError("expecting a list value, got a {}".format(type(obj)))

value is not defined so its an unresolved reference.

nineinchnick commented 8 months ago

Can you try to create Findings for overrides with only the thread_id, without id? This is what we have in tests. The README might be wrong.

MichaelMcAleer commented 8 months ago

Tried again @nineinchnick where Findings for overrides do not have an id set, only threat_id. The outcome is still the same, the threats are still listed in the generated report.

Another thing I tried was to manually set tm.threatsFile as a path to the threats.json file to see if user supplied threats would be overridden but the outcome was still the same, the threats were not overridden in the generated report.

So far the only thing I have been able to do to have element threats overriden is to change the condition whereby a threat is evaluated from:

https://github.com/izar/pytm/blob/5ec33e15cb263504f0f8126d745e7f0b8c7dc8f6/pytm/pytm.py#L804-L806

To:

if not t.apply(e) or t.id in override_ids:
    continue

The reasoning being we can continue to the next threat in the loop because the threat id is in the list of override ids.

I do want to point out that I view this as an over simplified solution, I would prefer to track the overridden threats in something like self.overriden_findings along with the Finding.response outlining if its mitigated, to migitage, accepted, transferred etc. These would be really useful in generated reports so we can document both active threats and overridden/resolved threats.

nineinchnick commented 8 months ago

Ok let me try to run your example and debug it. Thanks for your patience on this!

nineinchnick commented 8 months ago

As a workaround, you could try setting the ignoreUnused option on the whole threat model.

nineinchnick commented 8 months ago

This is definitely a bug. We should avoid creating such invalid elements in the first place. I left a comment about that in #223.

MichaelMcAleer commented 8 months ago

I am on PTO so responses will be delayed for the next week.

I found another bug, when overriding findings, the findings output do not record the CVSS score. For example, using the following overrides:

user_to_web.overrides = [
    Finding(
        id="63",
        threat_id="DE01",
        CVSS="9.3",
        response="""**To Mitigate**: run a memory sanitizer to validate the binary""",
    ),
    Finding(
        id="64",
        threat_id="AC05",
        CVSS="9.3",
        response="""**To Mitigate**: run a memory sanitizer to validate the binary""",
    ),
    Finding(
        id="65",
        threat_id="DE03",
        CVSS="9.3",
        response="""**To Mitigate**: run a memory sanitizer to validate the binary""",
    ),
    Finding(
        id="66",
        threat_id="CR06",
        CVSS="9.3",
        response="""**To Mitigate**: run a memory sanitizer to validate the binary""",
    )
]

The related findings in the JSON output do not have the CVSS score set:

{
            "condition": "not target.controls.isEncrypted or (target.source.inScope and not target.isResponse and (not target.controls.authenticatesDestination or not target.controls.checksDestinationRevocation)) or target.tlsVersion < target.sink.minTLSVersion",
            "cvss": "",
            "description": "Interception",
            "details": "An adversary monitors data streams to or from the target for information gathering purposes. This attack may be undertaken to solely gather sensitive information or to support a further attack against the target. This attack pattern can involve sniffing network traffic as well as other types of data streams (e.g. radio). The adversary can attempt to initiate the establishment of a data stream, influence the nature of the data transmitted, or passively observe the communications as they unfold. In all variants of this attack, the adversary is not the intended recipient of the data stream. In contrast to other means of gathering information (e.g., targeting data leaks), the adversary must actively position himself so as to observe explicit data channels (e.g. network traffic) and read the content.",
            "example": "Adversary tries to block, manipulate, and steal communications in an attempt to achieve a desired negative technical impact.",
            "id": "63",
            "mitigations": "Leverage encryption to encode the transmission of data thus making it accessible only to authorized parties.",
            "references": "https://capec.mitre.org/data/definitions/117.html, http://cwe.mitre.org/data/definitions/319.html, https://cwe.mitre.org/data/definitions/299.html",
            "response": "**To Mitigate**: run a memory sanitizer to validate the binary",
            "severity": "Medium",
            "target": "User enters comments (*)",
            "threat_id": "DE01"
        },

Note: If the Finding.id is not set, the JSON output for that element has overrides as a list of empty strings. Example JSON output for the user_to_web dataflow where overrides finding.id are not set:

{
            "controls": {...},
            "data": [
                "Comments in HTML or Markdown"
            ],
            "description": "",
            "dstPort": 80,
            "findings": [
                "63",
                "64",
                "65",
                "66",
                "67",
                "68",
                "69"
            ],
            "implementsCommunicationProtocol": false,
            "inBoundary": null,
            "inScope": true,
            "isResponse": false,
            "levels": [],
            "maxClassification": "Classification.UNKNOWN",
            "minTLSVersion": "TLSVersion.NONE",
            "name": "User enters comments (*)",
            "note": "This is a simple web app\nthat stores and retrieves user comments.",
            "order": 2,
            "overrides": [
                "",
                "",
                "",
                ""
            ],
            "protocol": "HTTP",
            "response": "Show comments (*)",
            "responseTo": null,
            "sink": "Web Server",
            "source": "User",
            "sourceFiles": [],
            "srcPort": -1,
            "tlsVersion": "TLSVersion.NONE",
            "usesSessionTokens": false,
            "usesVPN": false
 }
raphaelahrens commented 8 months ago

The CVSS score is not being set because in the example code

    Finding(
        id="66",
        threat_id="CR06",
        CVSS="9.3",
        response="""**To Mitigate**: run a memory sanitizer to validate the binary""",
    )

the member variable CVSS is set. But the pytm code is looking for a cvss member variable. So the value CVSS is being set, but is not used and the value cvss is set to the default "" and used.