Blocks failing on new RHEL9 Automation Broker

zdc217 commented 1 year ago

When the new automation broker running on RHEL9 is used for the null router SOAR app the block action fails with an error message:

handle_action exception occurred. Error string: 'Required parameter 'autoscale' not found in dictionary'

This happened with 100% of the block attempts on the RHEL9 server. After roiling back to the old Centos server the block action stopped failing.

We may want to see if there is any place in our code where the autoscale parameter is being dropped, or see if there is a way in code to ensure it's always passed, although we have it set to false by default in the app.json.

Reference

Test Data

Playbook we have been testing with: BHR Block DEV Test, Event ID: 114783

Some fields we have been testing with:

{"cidr":"20.106.254.65/32","duration":"300","source":"SOAR","why":"testing"}

edthedev commented 1 year ago

1.0.0 tag is out of date: https://github.com/techservicesillinois/secops-splunk-null-router/releases/tag/1.0.0 tagged May 17 2022

Latest commit on main: https://github.com/techservicesillinois/secops-splunk-null-router/commits/main committed Sept 5, 2023

edthedev commented 1 year ago

Autoscale parameter was merged to main on Oct 20, 2022.

edthedev commented 1 year ago

Per our CHANGELOG updates (based on git commit logs), the rogue version issue would have caused this issue. We are preparing release 5.0 to both get the latest code released (with dependency security fixes) and to get the production version onto a tag. We have also retro-actively tagged version 0.0.0, since it is what is running now (and is named 0.0.0 is the SOAR app, and 0.0.0 will no longer be overwritten, so that tag will remain valid.)

edthedev commented 1 year ago

We do have every intention of marking 0.0.0 as [YANKED] once 5.0.0 is in production and stable, since 0.0.0 was not an intentional release, but the release system going rogue.

edthedev commented 1 year ago

We have tagged release 5.0.0, which we believe will allow the RHEL9 version of the SOAR automation broker to work with null router.

zdc217 commented 1 year ago

After pushing our 5.0.0 release and trying the new automation broker, the block action is still failing. CSOC has switched back to 0.0.0 version and old centos automation broker.

zdc217 commented 1 year ago

Noticing that there are 2 BHR apps in SOAR, both with different versions offered.

The first app show has "appid": "949dc1bb-0ca3-404c-b698-d7d21d94eb94" And offers versions 4.0.1, 1.0.0, and 0.0.0

The second app shown has "appid": "fc618bee-352f-461a-95b5-bc0a2395302a" And offers version 5.0.0 and 1.0.1

This is probably the first thing to sort out.

zdc217 commented 1 year ago

Also noticing 4 null router related assets when there should only be 2:

Our two BHR versions from above have differently named assets

Based on what is selected in the playbook, it is not using the app that has the 5.0.0 version. It is using the app the has the 0.0.0 version.

zdc217 commented 1 year ago

Inventory of playbooks currently using BHR app and what asset and version they are set to:

csoc-manual-ipblock: prod-null-router, 0.0.0
csoc-network-crowdstrike_iocs: prod-null-router, 0.0.0
csoc-network-exploit-auto: prod-null-router, 0.0.0 in one block and null-router-prod, 5.0.0 in another block (though it is part of a dead branch in the playbook)
csoc-network-ipblock-auto: prod-null-router, 0.0.0

zdc217 commented 1 year ago

Note that there is a test_null_router_2 out there somewhere as well:

zdc217 commented 1 year ago

We tested the 5.0.0 version, using the null_router_test asset against the old and new automation broker and were able to get a successful block on both

Next steps:

Confirm with Ed and SecEng that there is nothing else we should test
Remove old app so that there is only one showing in the app listing
Confirm that all playbooks are using the new version of the app
Confirm that new version of app is using the correct broker

zdc217 commented 1 year ago

Commit id 873b2a8cd0de60f9c8a09b721620cb4b4d61fc51 (Date: Mon Apr 3 11:13:06 2023 -0500) appears to be where the newer app ID was created.

edthedev commented 1 year ago

Test in SOAR against the 5.0.0 version, with the new RHEL9 automation broker seemed to block just fine.

edthedev commented 1 year ago

Re-test with autoscale also succeeds.

edthedev commented 1 year ago

Re-test with a valid IP also succeeds.

edthedev commented 1 year ago

We're going to have to clean up all the older versions.

It's possible this could cause an outage in the playbook, if this is a real error in the 5.0.0 version, because there will be no true roll-back path - only re-deploying old versions to the new app.

Let's discuss with CSOC and SecEng at the next automation meeting, and then schedule a time to clean out all the old versions, on a Monday or Tuesday, so SecDev has time if a fast patch is needed.

edthedev commented 1 year ago

We could try to duplicate the issue by creating our own test playbook and try to get the combination or old project, broker and block block into the playbook to duplicate the error - then work on a recipe to verify what steps are required to retire the old code.

edthedev commented 1 year ago

Let's also diff the .json between the 0.0.0 tag and the 5.0.0

edthedev commented 1 year ago

Diff of the app.json from the 0.0.0 tag to the 5.0.0 tag: (Nothing stands out as an issue, and notably the autoscale property is completely unchanged.

2,3c2,3
<       "appid": "949dc1bb-0ca3-404c-b698-d7d21d94eb94",
<       "name": "NCSA BHR",
---
>   "appid": "APP_ID",
>   "name": "APP_NAME",
5,6c5
<       "type": "network security",
<       "product_vendor": "Tech Services Illinois",
---
>   "type": "Cybersecurity Incident Response",
9c8,9
<       "product_name": "NCSA_BHR",
---
>   "product_vendor": "NCSA",
>   "product_name": "Blackhole Null Router",
12,13c12,13
<       "publisher": "Cybersecurity",
<       "license": "Copyright (c) Cybersecurity, 2022",
---
>   "publisher": "University of Illinois Technology Services Cybersecurity",
>   "license": "NCSA",
15,18c15,18
<       "utctime_updated": "GITHUB_DEPLOYED",
<       "package_name": "phantom_soar_null_router",
<       "main_module": "soar_null_router_connector.py",
<       "min_phantom_version": "5.3.1.84890",
---
>   "utctime_updated": "BUILD_TIME",
>   "package_name": "MODULE",
>   "main_module": "app.py",
>   "min_phantom_version": "5.3.3.92213",
139c139
<         "pypi": [
---
>     "wheel": [
141c141,142
<                 "module": "bhr-client==1.6"
---
>         "module": "bhr-client",
>         "input_file": "bhr_client-1.6-py3-none-any.whl"
145,149d145
<       "custom_made": false,
<       "directory": "soar_null_router_fceeaac1-8f96-46d6-9c3b-896e363eb004",
<       "version": 1,
<       "appname": "-",
<       "executable": "spawn3",

edthedev commented 1 year ago

Differences between the .json for 1.0.0 (older) and 0.0.0 (newer) are more interesting:

But 1.0.0 doesn't have the 'autoscale' property, and so should not be able to generate the error we are seeing.

<       "appid": "fceeaac1-8f96-46d6-9c3b-896e363eb004",
---
>       "appid": "949dc1bb-0ca3-404c-b698-d7d21d94eb94",
15c15
<       "utctime_updated": "2022-05-10T20:00:55.716700Z",
---
>       "utctime_updated": "GITHUB_DEPLOYED",
52a53,102
>                 },
>                 "source": {
>                       "description": "Source of the block",
>                       "data_type": "string",
>                       "required": false,
>                       "primary": true,
>                       "contains": [],
>                       "value_list": [],
>                       "default": "SOAR",
>                       "order": 1,
>                       "name": "source",
>                       "id": 2,
>                       "param_name": "source"
>                 },
>                 "why": {
>                       "description": "Reason for the block",
>                       "data_type": "string",
>                       "required": true,
>                       "primary": true,
>                       "contains": [],
>                       "value_list": [],
>                       "default": "",
>                       "order": 2,
>                       "name": "why",
>                       "id": 3,
>                       "param_name": "why"
>                 },
>                 "duration": {
>                       "description": "Time to block in seconds or formatted time string",
>                       "data_type": "string",
>                       "required": false,
>                       "primary": true,
>                       "contains": [],
>                       "value_list": [],
>                       "default": "300",
>                       "order": 3,
>                       "name": "duration",
>                       "id": 4,
>                       "param_name": "duration"
>                 },
>                 "autoscale": {
>                       "description": "Whether or not to auto scale the duration based on server side block history",
>                       "data_type": "boolean",
>                       "required": false,
>                       "primary": true,
>                       "default": false,
>                       "order": 4,
>                       "name": "autoscale",
>                       "id": 5,
>                       "param_name": "autoscale"

edthedev commented 1 year ago

Here's the difference between 0.0.0 on SOAR and 0.0.0 in GitHub of the JSON:

(These are expected differences, due to our deployment process.)

delaport@ip-10-225-88-150:~/src/secops-splunk-null-router$ diff temp.working.0.0.0.json temp_0.0.0.json -w
14,15c14,15
<     "app_version": "0.0.0",
<     "utctime_updated": "2023-02-17T16:56:39.911188Z",
---
>       "app_version": "GITHUB_TAG",
>       "utctime_updated": "GITHUB_DEPLOYED",
146c146
<     "directory": "ncsabhr_949dc1bb-0ca3-404c-b698-d7d21d94eb94",
---
>       "directory": "soar_null_router_fceeaac1-8f96-46d6-9c3b-896e363eb004",

edthedev commented 1 year ago

We have a test playbook that no longer works after some update.

Our block block had this Python code:

@phantom.playbook_block()
def block_1(action=None, success=None, container=None, results=None, handle=None, filtered_artifacts=None, filtered_results=None, custom_function=None, **kwargs):
    phantom.debug("block_1() called")

    # phantom.debug('Action: {0} {1}'.format(action['name'], ('SUCCEEDED' if success else 'FAILED')))

    cidr_formatted_string = phantom.format(
        container=container,
        template="""{0}/32""",
        parameters=[
            "artifact:*.cef.src"
        ])

    container_artifact_data = phantom.collect2(container=container, datapath=["artifact:*.cef.src","artifact:*.id"])

    parameters = []

    # build parameters list for 'block_1' call
    for container_artifact_item in container_artifact_data:
        if cidr_formatted_string is not None:
            parameters.append({
                "cidr": cidr_formatted_string,
                "context": {'artifact_id': container_artifact_item[1]},
            })

    ################################################################################
    ## Custom Code Start
    ################################################################################

    # Write your custom code here...

    ################################################################################
    ## Custom Code End
    ################################################################################

    phantom.act("block", parameters=parameters, name="block_1", assets=["test_null_router"])

    return****

edthedev commented 1 year ago

Adding our block function from our latest Test BHR results in this Python code behind the block:

@phantom.playbook_block()
def block_2(action=None, success=None, container=None, results=None, handle=None, filtered_artifacts=None, filtered_results=None, custom_function=None, **kwargs):
    phantom.debug("block_2() called")

    # phantom.debug('Action: {0} {1}'.format(action['name'], ('SUCCEEDED' if success else 'FAILED')))

    container_artifact_data = phantom.collect2(container=container, datapath=["artifact:*.cef.src","artifact:*.id"])

    parameters = []

    # build parameters list for 'block_2' call
    for container_artifact_item in container_artifact_data:
        if container_artifact_item[0] is not None:
            parameters.append({
                "source": "SOAR",
                "duration": 300,
                "cidr": container_artifact_item[0],
                "why": "testing",
                "context": {'artifact_id': container_artifact_item[1]},
            })

    ################################################################################
    ## Custom Code Start
    ################################################################################

    # Write your custom code here...

    ################################################################################
    ## Custom Code End
    ################################################################################

    phantom.act("block", parameters=parameters, name="block_2", assets=["test_null_router_2"])

    return

NOTE: It does not generate the autoscale parameter when we don't check that box. This may be incompatible with our call to get_required_param.

edthedev commented 1 year ago

Here's the Python code behind the current block block on csoc-manual-ip-block:

@phantom.playbook_block()
def block_1(action=None, success=None, container=None, results=None, handle=None, filtered_artifacts=None, filtered_results=None, custom_function=None, **kwargs):
    phantom.debug("block_1() called")

    # phantom.debug('Action: {0} {1}'.format(action['name'], ('SUCCEEDED' if success else 'FAILED')))

    why_formatted_string = phantom.format(
        container=container,
        template="""Block from Splunk SOAR: {0}\n""",
        parameters=[
            "artifact:*.cef.message"
        ])
    cidr_formatted_string = phantom.format(
        container=container,
        template="""{0}/32\n""",
        parameters=[
            "artifact:*.cef.ip"
        ])

    container_artifact_data = phantom.collect2(container=container, datapath=["artifact:*.cef.message","artifact:*.cef.ip","artifact:*.id"])

    parameters = []

    # build parameters list for 'block_1' call
    for container_artifact_item in container_artifact_data:
        if why_formatted_string is not None and cidr_formatted_string is not None:
            parameters.append({
                "why": why_formatted_string,
                "cidr": cidr_formatted_string,
                "source": "SOAR",
                "duration": <REDACTED NUMBER>,
                "context": {'artifact_id': container_artifact_item[2]},
            })

    ################################################################################
    ## Custom Code Start
    ################################################################################

    # Write your custom code here...

    ################################################################################
    ## Custom Code End
    ################################################################################

    phantom.act("block", parameters=parameters, name="block_1", assets=["prod_null_router"], callback=set_status_1)

    return

edthedev commented 1 year ago

We replicated the issue with the Test BHR code (should be identical to 5.0.0), using Playbook BHR Block DEV Test:

Playbook 'BHR Block DEV Test' (playbook id: 995) executed (playbook run id: 121455) on csoc-network-honeypot 'SOAR Honeypot Attacker Alert - 30min'(container id: 114783).
   Playbook execution status is 'success'
    Total actions executed: 2
    Action 'whois_ip_1'(whois ip)
       Status: success
        App 'WHOIS' executed the action on asset 'whois'
           Status: success
           Parameter: {"ip":"20.106.254.65"}
    Action 'block_2'(block)
       Status: failed
        App 'Test Box - NCSA BHR' executed the action on asset 'test_null_router_2'
           Status: failed
           Parameter: {"cidr":"20.106.254.65/32","duration":300,"source":"SOAR","why":"testing"}
        App 'Test Box - NCSA BHR' executed the action on asset 'test_null_router_2'
           Status: failed
           Parameter: {"cidr":"20.106.254.65/32","duration":"300","source":"SOAR","why":"testing"}

Sep 18, 14:56:40 : on_finish() called
Sep 18, 14:56:40 : metrics: Playbook_id:995, run_id:121455, container: 114783, function: on_finish. TIME_TAKEN 8ms
Sep 18, 14:56:40 : *** Playbook 'local/BHR Block DEV Test' execution (121455) has completed with status: SUCCESS ***
Sep 18, 14:56:40 : 'block_2' on asset 'test_null_router_2': 2 actions failed
Sep 18, 14:56:40 : 'block_2'  on asset 'test_null_router_2' completed with status: 'failed'. Action Info: Size : 705 bytes : [{"app_name":"Test Box - NCSA BHR","asset_name":"test_null_router_2","param":{"why": "testing", "cidr": "20.106.254.65/32", "source": "SOAR", "context": {"guid": "315485a1-c716-4ed3-8074-5f04d0d8f6dd", "artifact_id": 207580, "parent_action_run": []}, "duration": "300"},"status":"failed","message":""},{"app_name":"Test Box - NCSA BHR","asset_name":"test_null_router_2","param":{"why": "testing", "cidr": "20.106.254.65/32", "source": "SOAR", "context": {"guid": "315485a1-c716-4ed3-8074-5f04d0d8f6dd", "artifact_id": 207580, "parent_action_run": []}, "duration": 300},"status":"failed","message":"handle_action exception occurred. Error string: 'Required parameter 'autoscale' not found in dictionary'"}]

edthedev commented 1 year ago

I think this indicates a bug here:

https://github.com/techservicesillinois/secops-splunk-null-router/blob/2df23eb7dbde4367d85ccca32f9480b903a61002/src/app/app.py#L210-L211

Since autoscale is not required. Note that autoscale, duration and source are not required.

We probably see this for autoscale due to the unique way SOAR passes in an un-set checkbox.

edthedev commented 1 year ago

Next up: switch test_null_router_2 from the soarauto (RHEL9) broker to the splunkauto (CENTOS) broker.

We think this may clear the error (if it copies what we saw in the attempted production switch).

edthedev commented 1 year ago

Uuhhhh.... Running with test_null_router_2 set to splunkauto gives this error:

Sep 18, 15:28:19 : Starting playbook 'local/BHR Block DEV Test (id: 995, version: 9, pyversion: 3, scm id: 2)' on event '114783' with playbook run id: 121459, running as user '46' with scope 'new'
Sep 18, 15:28:19 : calling on_start() on csoc-network-honeypot 'SOAR Honeypot Attacker Alert - 30min'(id: 114783). 
Sep 18, 15:28:19 : on_start() called
Sep 18, 15:28:19 : whois_ip_1() called
Sep 18, 15:28:19 : phantom.collect2(): called for datapath['artifact:*.cef.src'], scope: new and filter_artifacts: []
Sep 18, 15:28:19 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 15:28:19 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 0 NEW artifacts
Sep 18, 15:28:19 : phantom.collect2(): called for datapath['artifact:*.cef.src', 'artifact:*.id'], scope: None and filter_artifacts: None
Sep 18, 15:28:19 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 15:28:19 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 0 NEW artifacts
Sep 18, 15:28:19 : calling ph_engine.act() with effective parameters size: 106 bytes. effective handle size: 0 bytes
Sep 18, 15:28:19 : phantom.act(): action 'whois ip' on assets: whois, callback function: 'block_2', with no action reviewer, no delay to execute the action, user provided name for the action as 'whois_ip_1', no tags, no asset type
Sep 18, 15:28:19 : phantom.act(): 'whois_ip_1' cannot be run on asset 'whois'. The "whois ip" action requires the following parameters: ip. The given parameters look like they were automatically generated by phantom.act() because an empty parameters list was passed to phantom.act(). The parameters list may have been empty because the preceding call to phantom.collect2() returned an empty list. Check your calling code in the action that generated this error
Sep 18, 15:28:19 : phantom.act(): 'whois_ip_1' cannot be run on asset 'whois'. The "whois ip" action requires the following parameters: ip. The given parameters look like they were automatically generated by phantom.act() because an empty parameters list was passed to phantom.act(). The parameters list may have been empty because the preceding call to phantom.collect2() returned an empty list. Check your calling code in the action that generated this error

Sep 18, 15:28:19 : metrics: Playbook_id:995, run_id:121459, container: 114783, function: on_start. TIME_TAKEN 154ms
Sep 18, 15:28:19 : calling action 'whois_ip_1''s callback function 'block_2()'
Sep 18, 15:28:19 : block_2() called
Sep 18, 15:28:19 : phantom.collect2(): called for datapath['artifact:*.cef.src'], scope: new and filter_artifacts: []
Sep 18, 15:28:19 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 15:28:19 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 0 NEW artifacts
Sep 18, 15:28:19 : phantom.collect2(): called for datapath['artifact:*.cef.src', 'artifact:*.id'], scope: None and filter_artifacts: None
Sep 18, 15:28:19 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 15:28:19 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 0 NEW artifacts
Sep 18, 15:28:19 : calling ph_engine.act() with effective parameters size: 106 bytes. effective handle size: 0 bytes
Sep 18, 15:28:19 : phantom.act(): action 'block' on assets: test_null_router_2, with no callback function, with no action reviewer, no delay to execute the action, user provided name for the action as 'block_2', no tags, no asset type
Sep 18, 15:28:19 : phantom.act(): 'block_2' cannot be run on asset 'test_null_router_2'. The "block" action requires the following parameters: cidr, why. The given parameters look like they were automatically generated by phantom.act() because an empty parameters list was passed to phantom.act(). The parameters list may have been empty because the preceding call to phantom.collect2() returned an empty list. Check your calling code in the action that generated this error
Sep 18, 15:28:19 : phantom.act(): 'block_2' cannot be run on asset 'test_null_router_2'. The "block" action requires the following parameters: cidr, why. The given parameters look like they were automatically generated by phantom.act() because an empty parameters list was passed to phantom.act(). The parameters list may have been empty because the preceding call to phantom.collect2() returned an empty list. Check your calling code in the action that generated this error

Sep 18, 15:28:19 : metrics: Playbook_id:995, run_id:121459, container: 114783, function: on_finish. TIME_TAKEN 113ms
Sep 18, 15:28:19 : finished action 'whois_ip_1''s callback function 'block_2()'
Sep 18, 15:28:19 : 

Playbook 'BHR Block DEV Test' (playbook id: 995) executed (playbook run id: 121459) on csoc-network-honeypot 'SOAR Honeypot Attacker Alert - 30min'(container id: 114783).
   Playbook execution status is 'failed'
    Total actions executed: 2
    Action 'whois_ip_1'(whois ip)
       Status: failed
    Action 'block_2'(block)
       Status: failed

Sep 18, 15:28:19 : on_finish() called
Sep 18, 15:28:19 : metrics: Playbook_id:995, run_id:121459, container: 114783, function: on_finish. TIME_TAKEN 7ms
Sep 18, 15:28:19 : *** Playbook 'local/BHR Block DEV Test' execution (121459) has completed with status: FAILED ***

undefined

edthedev commented 1 year ago

Deleting the block block, and re-adding it changes the error message back:

Sep 18, 15:34:56 : Starting playbook 'local/BHR Block DEV Test (id: 996, version: 10, pyversion: 3, scm id: 2)' on event '114783' with playbook run id: 121466, running as user '46' with scope 'new'
Sep 18, 15:34:56 : calling on_start() on csoc-network-honeypot 'SOAR Honeypot Attacker Alert - 30min'(id: 114783). 
Sep 18, 15:34:56 : on_start() called
Sep 18, 15:34:56 : whois_ip_1() called
Sep 18, 15:34:56 : phantom.collect2(): called for datapath['artifact:*.cef.src'], scope: new and filter_artifacts: []
Sep 18, 15:34:56 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 15:34:56 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 1 NEW artifacts
Sep 18, 15:34:56 : phantom.collect2(): called for datapath['artifact:*.cef.src', 'artifact:*.id'], scope: None and filter_artifacts: None
Sep 18, 15:34:56 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 15:34:56 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 1 NEW artifacts
Sep 18, 15:34:56 : calling ph_engine.act() with effective parameters size: 111 bytes. effective handle size: 0 bytes
Sep 18, 15:34:56 : phantom.act(): action 'whois ip' on assets: whois, callback function: 'block_1', with no action reviewer, no delay to execute the action, user provided name for the action as 'whois_ip_1', no tags, no asset type
Sep 18, 15:34:56 : metrics: Playbook_id:996, run_id:121466, container: 114783, function: on_start. TIME_TAKEN 132ms
Sep 18, 15:34:56 : 'whois ip' (named as 'whois_ip_1') shall be executed now on asset 'whois'(id:4) using app 'WHOIS'
Sep 18, 15:34:56 : 'whois_ip_1' on asset 'whois': Loaded action execution configuration
Sep 18, 15:34:56 : 'whois_ip_1' on asset 'whois': successfully acquired the lock '4'
Sep 18, 15:34:56 : 'whois_ip_1' on asset 'whois': Successfully acquired the lock '4'
Sep 18, 15:34:57 : 'whois_ip_1' on asset 'whois': Querying...
Sep 18, 15:34:58 : 'whois_ip_1' on asset 'whois': Parsing response
Sep 18, 15:34:58 : 'whois_ip_1' on asset 'whois': Successfully released the lock '4'
Sep 18, 15:34:58 : 'whois_ip_1' on asset 'whois': 1 action succeeded. (1)For Parameter: {"ip":"20.106.254.65"} Message: "Registry: arin\nASN: 8075\nCountry: US\nNets:\nRange: 20.33.0.0 - 20.128.255.255\nAddress: One Microsoft Way"
Sep 18, 15:34:58 : 'whois_ip_1'  on asset 'whois' completed with status: 'success'. Action Info: Size : 324 bytes : [{"app_name":"WHOIS","asset_name":"whois","param":{"ip": "20.106.254.65", "context": {"guid": "3f66c27b-f4d0-40fa-81e1-722b7e128707", "artifact_id": 207580, "parent_action_run": []}},"status":"success","message":"Registry: arin\nASN: 8075\nCountry: US\nNets:\nRange: 20.33.0.0 - 20.128.255.255\nAddress: One Microsoft Way"}]
Sep 18, 15:34:58 : calling action 'whois_ip_1''s callback function 'block_1()'
Sep 18, 15:34:58 : block_1() called
Sep 18, 15:34:58 : phantom.collect2(): called for datapath['artifact:*.cef.src'], scope: new and filter_artifacts: []
Sep 18, 15:34:58 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 15:34:58 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 1 NEW artifacts
Sep 18, 15:34:58 : phantom.collect2(): called for datapath['artifact:*.cef.src', 'artifact:*.id'], scope: None and filter_artifacts: None
Sep 18, 15:34:58 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 15:34:58 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 1 NEW artifacts
Sep 18, 15:34:58 : calling ph_engine.act() with effective parameters size: 169 bytes. effective handle size: 0 bytes
Sep 18, 15:34:58 : phantom.act(): action 'block' on assets: test_null_router_2, with no callback function, with no action reviewer, no delay to execute the action, user provided name for the action as 'block_1', no tags, no asset type
Sep 18, 15:34:58 : metrics: Playbook_id:996, run_id:121466, container: 114783, function: on_finish. TIME_TAKEN 116ms
Sep 18, 15:34:58 : finished action 'whois_ip_1''s callback function 'block_1()'
Sep 18, 15:34:58 : 1 action succeeded
Sep 18, 15:34:58 : 'block' (named as 'block_1') shall be executed now on asset 'test_null_router_2'(id:56) using app 'Test Box - NCSA BHR'
Sep 18, 15:34:58 : Running action (block_1) on AutomationBroker (splunkauto)
Sep 18, 15:34:59 : action '' did not have any callback. The action is now marked completed
Sep 18, 15:34:59 : 

Playbook 'BHR Block DEV Test' (playbook id: 996) executed (playbook run id: 121466) on csoc-network-honeypot 'SOAR Honeypot Attacker Alert - 30min'(container id: 114783).
   Playbook execution status is 'success'
    Total actions executed: 2
    Action 'whois_ip_1'(whois ip)
       Status: success
        App 'WHOIS' executed the action on asset 'whois'
           Status: success
           Parameter: {"ip":"20.106.254.65"}
    Action 'block_1'(block)
       Status: failed
        App 'Test Box - NCSA BHR' executed the action on asset 'test_null_router_2'
           Status: failed
           Parameter: {"cidr":"20.106.254.65/32","duration":"300","source":"SOAR","why":"testing"}
        App 'Test Box - NCSA BHR' executed the action on asset 'test_null_router_2'
           Status: failed
           Parameter: {"cidr":"20.106.254.65/32","duration":300,"source":"SOAR","why":"testing"}

Sep 18, 15:34:59 : on_finish() called
Sep 18, 15:34:59 : metrics: Playbook_id:996, run_id:121466, container: 114783, function: on_finish. TIME_TAKEN 7ms
Sep 18, 15:35:00 : *** Playbook 'local/BHR Block DEV Test' execution (121466) has completed with status: SUCCESS ***

Sep 18, 15:35:00 : 'block_1' on asset 'test_null_router_2': 2 actions failed
Sep 18, 15:35:00 : 'block_1'  on asset 'test_null_router_2' completed with status: 'failed'. Action Info: Size : 705 bytes : [{"app_name":"Test Box - NCSA BHR","asset_name":"test_null_router_2","param":{"why": "testing", "cidr": "20.106.254.65/32", "source": "SOAR", "context": {"guid": "09e999a1-bbfa-499f-8aeb-7cb38bb0adbc", "artifact_id": 207580, "parent_action_run": []}, "duration": "300"},"status":"failed","message":""},{"app_name":"Test Box - NCSA BHR","asset_name":"test_null_router_2","param":{"why": "testing", "cidr": "20.106.254.65/32", "source": "SOAR", "context": {"guid": "09e999a1-bbfa-499f-8aeb-7cb38bb0adbc", "artifact_id": 207580, "parent_action_run": []}, "duration": 300},"status":"failed","message":"handle_action exception occurred. Error string: 'Required parameter 'autoscale' not found in dictionary'"}]
Sep 18, 15:35:00 : 1 action failed for app Test Box - NCSA BHR
undefined

edthedev commented 1 year ago

Worth noting that block_1 and block_2 sometimes swap. That seems suspicious.

edthedev commented 1 year ago

Some thoughts on next steps:

Could rename the block function in case that is causing issues.
Wondering if CENTOS is forcing a downgrade to an older version of the APP, before a bug was introduced. (So our 5.0.0 code simply isn't running...)
But there's no diff between 0.0.0 and 5.0.0 that would account for this issue....
This could also be caused by a change in Phantom/SOAR behaviour of 'get required param' between version available on CENTOS and the version available on RHEL9 (based probably more on Python version, of course).

edthedev commented 1 year ago

We could try to produce a 5.0.1 that eliminates the autoscale required error message (by updating the get required paramater call for a get optional parameter call). We can test that in our Playbook BHR Block DEV Test playbook against test_null_router_2 - on each of the splunkauto and soarauto and see what we get.

If that worked to clear the error that we were able to replicate, then we could have SecEng try the upgrade again with 5.0.1.

edthedev commented 1 year ago

After patching for #53 - we get the same exact (totally unrelated?) error when connected to the wrong broker anyway...:

(We get this error on both brokers now...)

Sep 18, 16:07:26 : Starting playbook 'local/BHR Block DEV Test (id: 996, version: 10, pyversion: 3, scm id: 2)' on event '114783' with playbook run id: 121475, running as user '46' with scope 'new'
Sep 18, 16:07:26 : calling on_start() on csoc-network-honeypot 'SOAR Honeypot Attacker Alert - 30min'(id: 114783). 
Sep 18, 16:07:26 : on_start() called
Sep 18, 16:07:27 : whois_ip_1() called
Sep 18, 16:07:27 : phantom.collect2(): called for datapath['artifact:*.cef.src'], scope: new and filter_artifacts: []
Sep 18, 16:07:27 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 16:07:27 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 0 NEW artifacts
Sep 18, 16:07:27 : phantom.collect2(): called for datapath['artifact:*.cef.src', 'artifact:*.id'], scope: None and filter_artifacts: None
Sep 18, 16:07:27 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 16:07:27 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 0 NEW artifacts
Sep 18, 16:07:27 : calling ph_engine.act() with effective parameters size: 106 bytes. effective handle size: 0 bytes
Sep 18, 16:07:27 : phantom.act(): action 'whois ip' on assets: whois, callback function: 'block_1', with no action reviewer, no delay to execute the action, user provided name for the action as 'whois_ip_1', no tags, no asset type
Sep 18, 16:07:27 : phantom.act(): 'whois_ip_1' cannot be run on asset 'whois'. The "whois ip" action requires the following parameters: ip. The given parameters look like they were automatically generated by phantom.act() because an empty parameters list was passed to phantom.act(). The parameters list may have been empty because the preceding call to phantom.collect2() returned an empty list. Check your calling code in the action that generated this error
Sep 18, 16:07:27 : phantom.act(): 'whois_ip_1' cannot be run on asset 'whois'. The "whois ip" action requires the following parameters: ip. The given parameters look like they were automatically generated by phantom.act() because an empty parameters list was passed to phantom.act(). The parameters list may have been empty because the preceding call to phantom.collect2() returned an empty list. Check your calling code in the action that generated this error

Sep 18, 16:07:27 : metrics: Playbook_id:996, run_id:121475, container: 114783, function: on_start. TIME_TAKEN 164ms
Sep 18, 16:07:27 : calling action 'whois_ip_1''s callback function 'block_1()'
Sep 18, 16:07:27 : block_1() called
Sep 18, 16:07:27 : phantom.collect2(): called for datapath['artifact:*.cef.src'], scope: new and filter_artifacts: []
Sep 18, 16:07:27 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 16:07:27 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 0 NEW artifacts
Sep 18, 16:07:27 : phantom.collect2(): called for datapath['artifact:*.cef.src', 'artifact:*.id'], scope: None and filter_artifacts: None
Sep 18, 16:07:27 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 16:07:27 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 0 NEW artifacts
Sep 18, 16:07:27 : calling ph_engine.act() with effective parameters size: 106 bytes. effective handle size: 0 bytes
Sep 18, 16:07:27 : phantom.act(): action 'block' on assets: test_null_router_2, with no callback function, with no action reviewer, no delay to execute the action, user provided name for the action as 'block_1', no tags, no asset type
Sep 18, 16:07:27 : phantom.act(): 'block_1' cannot be run on asset 'test_null_router_2'. The "block" action requires the following parameters: cidr, why. The given parameters look like they were automatically generated by phantom.act() because an empty parameters list was passed to phantom.act(). The parameters list may have been empty because the preceding call to phantom.collect2() returned an empty list. Check your calling code in the action that generated this error
Sep 18, 16:07:27 : phantom.act(): 'block_1' cannot be run on asset 'test_null_router_2'. The "block" action requires the following parameters: cidr, why. The given parameters look like they were automatically generated by phantom.act() because an empty parameters list was passed to phantom.act(). The parameters list may have been empty because the preceding call to phantom.collect2() returned an empty list. Check your calling code in the action that generated this error

Sep 18, 16:07:27 : metrics: Playbook_id:996, run_id:121475, container: 114783, function: on_finish. TIME_TAKEN 117ms
Sep 18, 16:07:27 : finished action 'whois_ip_1''s callback function 'block_1()'
Sep 18, 16:07:27 : 

Playbook 'BHR Block DEV Test' (playbook id: 996) executed (playbook run id: 121475) on csoc-network-honeypot 'SOAR Honeypot Attacker Alert - 30min'(container id: 114783).
   Playbook execution status is 'failed'
    Total actions executed: 2
    Action 'whois_ip_1'(whois ip)
       Status: failed
    Action 'block_1'(block)
       Status: failed

Sep 18, 16:07:27 : on_finish() called
Sep 18, 16:07:27 : metrics: Playbook_id:996, run_id:121475, container: 114783, function: on_finish. TIME_TAKEN 10ms
Sep 18, 16:07:27 : *** Playbook 'local/BHR Block DEV Test' execution (121475) has completed with status: FAILED ***

edthedev commented 1 year ago

When we remove the suddenly-failing-oh-WHY call to whois, we get a success on the IP/CIDR:

Sep 18, 16:15:52 : Starting playbook 'local/BHR Block DEV Test (id: 997, version: 11, pyversion: 3, scm id: 2)' on event '114783' with playbook run id: 121480, running as user '46' with scope 'new'
Sep 18, 16:15:52 : calling on_start() on csoc-network-honeypot 'SOAR Honeypot Attacker Alert - 30min'(id: 114783). 
Sep 18, 16:15:52 : on_start() called
Sep 18, 16:15:52 : block_1() called
Sep 18, 16:15:52 : calling ph_engine.act() with effective parameters size: 85 bytes. effective handle size: 0 bytes
Sep 18, 16:15:52 : phantom.act(): action 'block' on assets: test_null_router_2, with no callback function, with no action reviewer, no delay to execute the action, user provided name for the action as 'block_1', no tags, no asset type
Sep 18, 16:15:52 : metrics: Playbook_id:997, run_id:121480, container: 114783, function: on_start. TIME_TAKEN 88ms
Sep 18, 16:15:52 : 'block' (named as 'block_1') shall be executed now on asset 'test_null_router_2'(id:56) using app 'Test Box - NCSA BHR'
Sep 18, 16:15:52 : Running action (block_1) on AutomationBroker (soarauto)
Sep 18, 16:15:53 : action '' did not have any callback. The action is now marked completed
Sep 18, 16:15:53 : 

Playbook 'BHR Block DEV Test' (playbook id: 997) executed (playbook run id: 121480) on csoc-network-honeypot 'SOAR Honeypot Attacker Alert - 30min'(container id: 114783).
   Playbook execution status is 'success'
    Total actions executed: 1
    Action 'block_1'(block)
       Status: success
        App 'Test Box - NCSA BHR' executed the action on asset 'test_null_router_2'
           Status: success
           Parameter: {"cidr":"20.106.254.65/32","duration":"300","source":"SOAR","why":"testing"}

Sep 18, 16:15:53 : on_finish() called
Sep 18, 16:15:53 : metrics: Playbook_id:997, run_id:121480, container: 114783, function: on_finish. TIME_TAKEN 8ms
Sep 18, 16:15:53 : *** Playbook 'local/BHR Block DEV Test' execution (121480) has completed with status: SUCCESS ***

Sep 18, 16:15:53 : 'block_1' on asset 'test_null_router_2': 1 action succeeded. (1)For Parameter: {"cidr":"20.106.254.65/32","context":{"artifact_id":0,"guid":"7e3d02e8-f73e-47dc-9bcb-977e10e8256d","parent_action_run":[]},"duration":"300","source":"SOAR","why":"testing"} Message: "Blocked 20.106.254.65/32"
Sep 18, 16:15:53 : 'block_1'  on asset 'test_null_router_2' completed with status: 'success'. Action Info: Size : 322 bytes : [{"app_name":"Test Box - NCSA BHR","asset_name":"test_null_router_2","param":{"why": "testing", "cidr": "20.106.254.65/32", "source": "SOAR", "context": {"guid": "7e3d02e8-f73e-47dc-9bcb-977e10e8256d", "artifact_id": 0, "parent_action_run": []}, "duration": "300"},"status":"success","message":"Blocked 20.106.254.65/32"}]
Sep 18, 16:15:53 : 1 action succeeded

edthedev commented 1 year ago

We did manage to replicate the error seen in production, and we managed to remove it (in a not very clean test...) via #53. So maybe that's our fix.

edthedev commented 1 year ago

Best guess get_req_value behaves differently on Python for RHEL9 vs Python for CENTOS?

We don't like this. It's weird.

edthedev commented 1 year ago

Next up:

Let's do some testing to verify that our optional values are being treated as optional.

edthedev commented 1 year ago

NOTE: If we don't trust get_req_value, after our remaining testing, let's remove it entirely.

edthedev commented 1 year ago

We have verified in the logs that, with our latest code, and our test playbook, the default values come through.

tzturner commented 1 year ago

We placed the following debug line in our code:

self.debug_print(f"Block arguments: {args}")

Which produced the following debug output in the splunk SOAR logs:

2023-09-19T15:41:11.888619 PID:228487 TID:228487 : DEBUG: SPAWN3 : /srv/phantom/alchemy/connectors/spawn3/spawn3.cpp : 1021 : AppConnector :Block arguments: {'cidr': '20.106.254.65/32', 'why': 'testing', 'source': 'SOAR', 'duration': '300', 'autoscale': False}

This confirms that SOAR is setting the defaults appropriately.

tzturner commented 1 year ago

We confirmed that adding the following line to our code produces the original error:

phantom.get_req_value(param, "autoscale")

Output from event run:

Sep 19, 11:01:09 : on_finish() called
Sep 19, 11:01:09 : metrics: Playbook_id:997, run_id:121670, container: 114783, function: on_finish. TIME_TAKEN 9ms
Sep 19, 11:01:09 : *** Playbook 'local/BHR Block DEV Test' execution (121670) has completed with status: FAILED ***
Sep 19, 11:01:09 : 'block_1' on asset 'test_null_router_2': 2 actions failed
Sep 19, 11:01:09 : 'block_1'  on asset 'test_null_router_2' completed with status: 'failed'. Action Info: Size : 695 bytes : [{"app_name":"Test Box - NCSA BHR","asset_name":"test_null_router_2","param":{"why": "testing", "cidr": "20.106.254.65/32", "source": "SOAR", "context": {"guid": "8b924826-8420-41be-aad4-5cd9b815964d", "artifact_id": 0, "parent_action_run": []}, "duration": "300"},"status":"failed","message":""},{"app_name":"Test Box - NCSA BHR","asset_name":"test_null_router_2","param":{"why": "testing", "cidr": "20.106.254.65/32", "source": "SOAR", "context": {"guid": "8b924826-8420-41be-aad4-5cd9b815964d", "artifact_id": 0, "parent_action_run": []}, "duration": 300},"status":"failed","message":"handle_action exception occurred. Error string: 'Required parameter 'autoscale' not found in dictionary'"}]
Sep 19, 11:01:09 : 1 action failed for app Test Box - NCSA BHR

techservicesillinois / secops-splunk-null-router

Blocks failing on new RHEL9 Automation Broker #51

Reference

Test Data