Closed zdc217 closed 1 year ago
1.0.0 tag is out of date: https://github.com/techservicesillinois/secops-splunk-null-router/releases/tag/1.0.0 tagged May 17 2022
Latest commit on main
:
https://github.com/techservicesillinois/secops-splunk-null-router/commits/main
committed Sept 5, 2023
Autoscale parameter was merged to main on Oct 20, 2022.
Per our CHANGELOG updates (based on git commit logs), the rogue version issue would have caused this issue. We are preparing release 5.0 to both get the latest code released (with dependency security fixes) and to get the production version onto a tag. We have also retro-actively tagged version 0.0.0, since it is what is running now (and is named 0.0.0 is the SOAR app, and 0.0.0 will no longer be overwritten, so that tag will remain valid.)
We do have every intention of marking 0.0.0 as [YANKED] once 5.0.0 is in production and stable, since 0.0.0 was not an intentional release, but the release system going rogue.
We have tagged release 5.0.0, which we believe will allow the RHEL9 version of the SOAR automation broker to work with null router.
After pushing our 5.0.0 release and trying the new automation broker, the block action is still failing. CSOC has switched back to 0.0.0 version and old centos automation broker.
Noticing that there are 2 BHR apps in SOAR, both with different versions offered.
The first app show has "appid": "949dc1bb-0ca3-404c-b698-d7d21d94eb94"
And offers versions 4.0.1, 1.0.0, and 0.0.0
The second app shown has "appid": "fc618bee-352f-461a-95b5-bc0a2395302a"
And offers version 5.0.0 and 1.0.1
This is probably the first thing to sort out.
Also noticing 4 null router related assets when there should only be 2:
Our two BHR versions from above have differently named assets
Based on what is selected in the playbook, it is not using the app that has the 5.0.0 version. It is using the app the has the 0.0.0 version.
Inventory of playbooks currently using BHR app and what asset and version they are set to:
Note that there is a test_null_router_2 out there somewhere as well:
We tested the 5.0.0 version, using the null_router_test
asset against the old and new automation broker and were able to get a successful block on both
Next steps:
Commit id 873b2a8cd0de60f9c8a09b721620cb4b4d61fc51 (Date: Mon Apr 3 11:13:06 2023 -0500) appears to be where the newer app ID was created.
Test in SOAR against the 5.0.0 version, with the new RHEL9 automation broker seemed to block just fine.
Re-test with autoscale
also succeeds.
Re-test with a valid IP also succeeds.
We're going to have to clean up all the older versions.
It's possible this could cause an outage in the playbook, if this is a real error in the 5.0.0 version, because there will be no true roll-back path - only re-deploying old versions to the new app.
Let's discuss with CSOC and SecEng at the next automation meeting, and then schedule a time to clean out all the old versions, on a Monday or Tuesday, so SecDev has time if a fast patch is needed.
We could try to duplicate the issue by creating our own test playbook and try to get the combination or old project, broker and block block into the playbook to duplicate the error - then work on a recipe to verify what steps are required to retire the old code.
Let's also diff the .json between the 0.0.0 tag and the 5.0.0
Diff of the app.json from the 0.0.0 tag to the 5.0.0 tag: (Nothing stands out as an issue, and notably the autoscale
property is completely unchanged.
2,3c2,3
< "appid": "949dc1bb-0ca3-404c-b698-d7d21d94eb94",
< "name": "NCSA BHR",
---
> "appid": "APP_ID",
> "name": "APP_NAME",
5,6c5
< "type": "network security",
< "product_vendor": "Tech Services Illinois",
---
> "type": "Cybersecurity Incident Response",
9c8,9
< "product_name": "NCSA_BHR",
---
> "product_vendor": "NCSA",
> "product_name": "Blackhole Null Router",
12,13c12,13
< "publisher": "Cybersecurity",
< "license": "Copyright (c) Cybersecurity, 2022",
---
> "publisher": "University of Illinois Technology Services Cybersecurity",
> "license": "NCSA",
15,18c15,18
< "utctime_updated": "GITHUB_DEPLOYED",
< "package_name": "phantom_soar_null_router",
< "main_module": "soar_null_router_connector.py",
< "min_phantom_version": "5.3.1.84890",
---
> "utctime_updated": "BUILD_TIME",
> "package_name": "MODULE",
> "main_module": "app.py",
> "min_phantom_version": "5.3.3.92213",
139c139
< "pypi": [
---
> "wheel": [
141c141,142
< "module": "bhr-client==1.6"
---
> "module": "bhr-client",
> "input_file": "bhr_client-1.6-py3-none-any.whl"
145,149d145
< "custom_made": false,
< "directory": "soar_null_router_fceeaac1-8f96-46d6-9c3b-896e363eb004",
< "version": 1,
< "appname": "-",
< "executable": "spawn3",
Differences between the .json
for 1.0.0 (older) and 0.0.0 (newer) are more interesting:
But 1.0.0 doesn't have the 'autoscale' property, and so should not be able to generate the error we are seeing.
< "appid": "fceeaac1-8f96-46d6-9c3b-896e363eb004",
---
> "appid": "949dc1bb-0ca3-404c-b698-d7d21d94eb94",
15c15
< "utctime_updated": "2022-05-10T20:00:55.716700Z",
---
> "utctime_updated": "GITHUB_DEPLOYED",
52a53,102
> },
> "source": {
> "description": "Source of the block",
> "data_type": "string",
> "required": false,
> "primary": true,
> "contains": [],
> "value_list": [],
> "default": "SOAR",
> "order": 1,
> "name": "source",
> "id": 2,
> "param_name": "source"
> },
> "why": {
> "description": "Reason for the block",
> "data_type": "string",
> "required": true,
> "primary": true,
> "contains": [],
> "value_list": [],
> "default": "",
> "order": 2,
> "name": "why",
> "id": 3,
> "param_name": "why"
> },
> "duration": {
> "description": "Time to block in seconds or formatted time string",
> "data_type": "string",
> "required": false,
> "primary": true,
> "contains": [],
> "value_list": [],
> "default": "300",
> "order": 3,
> "name": "duration",
> "id": 4,
> "param_name": "duration"
> },
> "autoscale": {
> "description": "Whether or not to auto scale the duration based on server side block history",
> "data_type": "boolean",
> "required": false,
> "primary": true,
> "default": false,
> "order": 4,
> "name": "autoscale",
> "id": 5,
> "param_name": "autoscale"
Here's the difference between 0.0.0 on SOAR and 0.0.0 in GitHub of the JSON:
(These are expected differences, due to our deployment process.)
delaport@ip-10-225-88-150:~/src/secops-splunk-null-router$ diff temp.working.0.0.0.json temp_0.0.0.json -w
14,15c14,15
< "app_version": "0.0.0",
< "utctime_updated": "2023-02-17T16:56:39.911188Z",
---
> "app_version": "GITHUB_TAG",
> "utctime_updated": "GITHUB_DEPLOYED",
146c146
< "directory": "ncsabhr_949dc1bb-0ca3-404c-b698-d7d21d94eb94",
---
> "directory": "soar_null_router_fceeaac1-8f96-46d6-9c3b-896e363eb004",
We have a test playbook that no longer works after some update.
Our block
block had this Python code:
@phantom.playbook_block()
def block_1(action=None, success=None, container=None, results=None, handle=None, filtered_artifacts=None, filtered_results=None, custom_function=None, **kwargs):
phantom.debug("block_1() called")
# phantom.debug('Action: {0} {1}'.format(action['name'], ('SUCCEEDED' if success else 'FAILED')))
cidr_formatted_string = phantom.format(
container=container,
template="""{0}/32""",
parameters=[
"artifact:*.cef.src"
])
container_artifact_data = phantom.collect2(container=container, datapath=["artifact:*.cef.src","artifact:*.id"])
parameters = []
# build parameters list for 'block_1' call
for container_artifact_item in container_artifact_data:
if cidr_formatted_string is not None:
parameters.append({
"cidr": cidr_formatted_string,
"context": {'artifact_id': container_artifact_item[1]},
})
################################################################################
## Custom Code Start
################################################################################
# Write your custom code here...
################################################################################
## Custom Code End
################################################################################
phantom.act("block", parameters=parameters, name="block_1", assets=["test_null_router"])
return****
Adding our block
function from our latest Test BHR results in this Python code behind the block:
@phantom.playbook_block()
def block_2(action=None, success=None, container=None, results=None, handle=None, filtered_artifacts=None, filtered_results=None, custom_function=None, **kwargs):
phantom.debug("block_2() called")
# phantom.debug('Action: {0} {1}'.format(action['name'], ('SUCCEEDED' if success else 'FAILED')))
container_artifact_data = phantom.collect2(container=container, datapath=["artifact:*.cef.src","artifact:*.id"])
parameters = []
# build parameters list for 'block_2' call
for container_artifact_item in container_artifact_data:
if container_artifact_item[0] is not None:
parameters.append({
"source": "SOAR",
"duration": 300,
"cidr": container_artifact_item[0],
"why": "testing",
"context": {'artifact_id': container_artifact_item[1]},
})
################################################################################
## Custom Code Start
################################################################################
# Write your custom code here...
################################################################################
## Custom Code End
################################################################################
phantom.act("block", parameters=parameters, name="block_2", assets=["test_null_router_2"])
return
NOTE: It does not generate the
autoscale
parameter when we don't check that box. This may be incompatible with our call to get_required_param.
Here's the Python code behind the current block
block on csoc-manual-ip-block
:
@phantom.playbook_block()
def block_1(action=None, success=None, container=None, results=None, handle=None, filtered_artifacts=None, filtered_results=None, custom_function=None, **kwargs):
phantom.debug("block_1() called")
# phantom.debug('Action: {0} {1}'.format(action['name'], ('SUCCEEDED' if success else 'FAILED')))
why_formatted_string = phantom.format(
container=container,
template="""Block from Splunk SOAR: {0}\n""",
parameters=[
"artifact:*.cef.message"
])
cidr_formatted_string = phantom.format(
container=container,
template="""{0}/32\n""",
parameters=[
"artifact:*.cef.ip"
])
container_artifact_data = phantom.collect2(container=container, datapath=["artifact:*.cef.message","artifact:*.cef.ip","artifact:*.id"])
parameters = []
# build parameters list for 'block_1' call
for container_artifact_item in container_artifact_data:
if why_formatted_string is not None and cidr_formatted_string is not None:
parameters.append({
"why": why_formatted_string,
"cidr": cidr_formatted_string,
"source": "SOAR",
"duration": <REDACTED NUMBER>,
"context": {'artifact_id': container_artifact_item[2]},
})
################################################################################
## Custom Code Start
################################################################################
# Write your custom code here...
################################################################################
## Custom Code End
################################################################################
phantom.act("block", parameters=parameters, name="block_1", assets=["prod_null_router"], callback=set_status_1)
return
We replicated the issue with the Test BHR code (should be identical to 5.0.0), using Playbook BHR Block DEV Test:
Playbook 'BHR Block DEV Test' (playbook id: 995) executed (playbook run id: 121455) on csoc-network-honeypot 'SOAR Honeypot Attacker Alert - 30min'(container id: 114783).
Playbook execution status is 'success'
Total actions executed: 2
Action 'whois_ip_1'(whois ip)
Status: success
App 'WHOIS' executed the action on asset 'whois'
Status: success
Parameter: {"ip":"20.106.254.65"}
Action 'block_2'(block)
Status: failed
App 'Test Box - NCSA BHR' executed the action on asset 'test_null_router_2'
Status: failed
Parameter: {"cidr":"20.106.254.65/32","duration":300,"source":"SOAR","why":"testing"}
App 'Test Box - NCSA BHR' executed the action on asset 'test_null_router_2'
Status: failed
Parameter: {"cidr":"20.106.254.65/32","duration":"300","source":"SOAR","why":"testing"}
Sep 18, 14:56:40 : on_finish() called
Sep 18, 14:56:40 : metrics: Playbook_id:995, run_id:121455, container: 114783, function: on_finish. TIME_TAKEN 8ms
Sep 18, 14:56:40 : *** Playbook 'local/BHR Block DEV Test' execution (121455) has completed with status: SUCCESS ***
Sep 18, 14:56:40 : 'block_2' on asset 'test_null_router_2': 2 actions failed
Sep 18, 14:56:40 : 'block_2' on asset 'test_null_router_2' completed with status: 'failed'. Action Info: Size : 705 bytes : [{"app_name":"Test Box - NCSA BHR","asset_name":"test_null_router_2","param":{"why": "testing", "cidr": "20.106.254.65/32", "source": "SOAR", "context": {"guid": "315485a1-c716-4ed3-8074-5f04d0d8f6dd", "artifact_id": 207580, "parent_action_run": []}, "duration": "300"},"status":"failed","message":""},{"app_name":"Test Box - NCSA BHR","asset_name":"test_null_router_2","param":{"why": "testing", "cidr": "20.106.254.65/32", "source": "SOAR", "context": {"guid": "315485a1-c716-4ed3-8074-5f04d0d8f6dd", "artifact_id": 207580, "parent_action_run": []}, "duration": 300},"status":"failed","message":"handle_action exception occurred. Error string: 'Required parameter 'autoscale' not found in dictionary'"}]
I think this indicates a bug here:
Since autoscale
is not required.
Note that autoscale
, duration
and source
are not required.
We probably see this for autoscale
due to the unique way SOAR passes in an un-set checkbox.
Next up: switch test_null_router_2
from the soarauto
(RHEL9) broker to the splunkauto
(CENTOS) broker.
We think this may clear the error (if it copies what we saw in the attempted production switch).
Uuhhhh....
Running with test_null_router_2
set to splunkauto
gives this error:
Sep 18, 15:28:19 : Starting playbook 'local/BHR Block DEV Test (id: 995, version: 9, pyversion: 3, scm id: 2)' on event '114783' with playbook run id: 121459, running as user '46' with scope 'new'
Sep 18, 15:28:19 : calling on_start() on csoc-network-honeypot 'SOAR Honeypot Attacker Alert - 30min'(id: 114783).
Sep 18, 15:28:19 : on_start() called
Sep 18, 15:28:19 : whois_ip_1() called
Sep 18, 15:28:19 : phantom.collect2(): called for datapath['artifact:*.cef.src'], scope: new and filter_artifacts: []
Sep 18, 15:28:19 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 15:28:19 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 0 NEW artifacts
Sep 18, 15:28:19 : phantom.collect2(): called for datapath['artifact:*.cef.src', 'artifact:*.id'], scope: None and filter_artifacts: None
Sep 18, 15:28:19 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 15:28:19 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 0 NEW artifacts
Sep 18, 15:28:19 : calling ph_engine.act() with effective parameters size: 106 bytes. effective handle size: 0 bytes
Sep 18, 15:28:19 : phantom.act(): action 'whois ip' on assets: whois, callback function: 'block_2', with no action reviewer, no delay to execute the action, user provided name for the action as 'whois_ip_1', no tags, no asset type
Sep 18, 15:28:19 : phantom.act(): 'whois_ip_1' cannot be run on asset 'whois'. The "whois ip" action requires the following parameters: ip. The given parameters look like they were automatically generated by phantom.act() because an empty parameters list was passed to phantom.act(). The parameters list may have been empty because the preceding call to phantom.collect2() returned an empty list. Check your calling code in the action that generated this error
Sep 18, 15:28:19 : phantom.act(): 'whois_ip_1' cannot be run on asset 'whois'. The "whois ip" action requires the following parameters: ip. The given parameters look like they were automatically generated by phantom.act() because an empty parameters list was passed to phantom.act(). The parameters list may have been empty because the preceding call to phantom.collect2() returned an empty list. Check your calling code in the action that generated this error
Sep 18, 15:28:19 : metrics: Playbook_id:995, run_id:121459, container: 114783, function: on_start. TIME_TAKEN 154ms
Sep 18, 15:28:19 : calling action 'whois_ip_1''s callback function 'block_2()'
Sep 18, 15:28:19 : block_2() called
Sep 18, 15:28:19 : phantom.collect2(): called for datapath['artifact:*.cef.src'], scope: new and filter_artifacts: []
Sep 18, 15:28:19 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 15:28:19 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 0 NEW artifacts
Sep 18, 15:28:19 : phantom.collect2(): called for datapath['artifact:*.cef.src', 'artifact:*.id'], scope: None and filter_artifacts: None
Sep 18, 15:28:19 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 15:28:19 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 0 NEW artifacts
Sep 18, 15:28:19 : calling ph_engine.act() with effective parameters size: 106 bytes. effective handle size: 0 bytes
Sep 18, 15:28:19 : phantom.act(): action 'block' on assets: test_null_router_2, with no callback function, with no action reviewer, no delay to execute the action, user provided name for the action as 'block_2', no tags, no asset type
Sep 18, 15:28:19 : phantom.act(): 'block_2' cannot be run on asset 'test_null_router_2'. The "block" action requires the following parameters: cidr, why. The given parameters look like they were automatically generated by phantom.act() because an empty parameters list was passed to phantom.act(). The parameters list may have been empty because the preceding call to phantom.collect2() returned an empty list. Check your calling code in the action that generated this error
Sep 18, 15:28:19 : phantom.act(): 'block_2' cannot be run on asset 'test_null_router_2'. The "block" action requires the following parameters: cidr, why. The given parameters look like they were automatically generated by phantom.act() because an empty parameters list was passed to phantom.act(). The parameters list may have been empty because the preceding call to phantom.collect2() returned an empty list. Check your calling code in the action that generated this error
Sep 18, 15:28:19 : metrics: Playbook_id:995, run_id:121459, container: 114783, function: on_finish. TIME_TAKEN 113ms
Sep 18, 15:28:19 : finished action 'whois_ip_1''s callback function 'block_2()'
Sep 18, 15:28:19 :
Playbook 'BHR Block DEV Test' (playbook id: 995) executed (playbook run id: 121459) on csoc-network-honeypot 'SOAR Honeypot Attacker Alert - 30min'(container id: 114783).
Playbook execution status is 'failed'
Total actions executed: 2
Action 'whois_ip_1'(whois ip)
Status: failed
Action 'block_2'(block)
Status: failed
Sep 18, 15:28:19 : on_finish() called
Sep 18, 15:28:19 : metrics: Playbook_id:995, run_id:121459, container: 114783, function: on_finish. TIME_TAKEN 7ms
Sep 18, 15:28:19 : *** Playbook 'local/BHR Block DEV Test' execution (121459) has completed with status: FAILED ***
undefined
Deleting the block
block, and re-adding it changes the error message back:
Sep 18, 15:34:56 : Starting playbook 'local/BHR Block DEV Test (id: 996, version: 10, pyversion: 3, scm id: 2)' on event '114783' with playbook run id: 121466, running as user '46' with scope 'new'
Sep 18, 15:34:56 : calling on_start() on csoc-network-honeypot 'SOAR Honeypot Attacker Alert - 30min'(id: 114783).
Sep 18, 15:34:56 : on_start() called
Sep 18, 15:34:56 : whois_ip_1() called
Sep 18, 15:34:56 : phantom.collect2(): called for datapath['artifact:*.cef.src'], scope: new and filter_artifacts: []
Sep 18, 15:34:56 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 15:34:56 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 1 NEW artifacts
Sep 18, 15:34:56 : phantom.collect2(): called for datapath['artifact:*.cef.src', 'artifact:*.id'], scope: None and filter_artifacts: None
Sep 18, 15:34:56 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 15:34:56 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 1 NEW artifacts
Sep 18, 15:34:56 : calling ph_engine.act() with effective parameters size: 111 bytes. effective handle size: 0 bytes
Sep 18, 15:34:56 : phantom.act(): action 'whois ip' on assets: whois, callback function: 'block_1', with no action reviewer, no delay to execute the action, user provided name for the action as 'whois_ip_1', no tags, no asset type
Sep 18, 15:34:56 : metrics: Playbook_id:996, run_id:121466, container: 114783, function: on_start. TIME_TAKEN 132ms
Sep 18, 15:34:56 : 'whois ip' (named as 'whois_ip_1') shall be executed now on asset 'whois'(id:4) using app 'WHOIS'
Sep 18, 15:34:56 : 'whois_ip_1' on asset 'whois': Loaded action execution configuration
Sep 18, 15:34:56 : 'whois_ip_1' on asset 'whois': successfully acquired the lock '4'
Sep 18, 15:34:56 : 'whois_ip_1' on asset 'whois': Successfully acquired the lock '4'
Sep 18, 15:34:57 : 'whois_ip_1' on asset 'whois': Querying...
Sep 18, 15:34:58 : 'whois_ip_1' on asset 'whois': Parsing response
Sep 18, 15:34:58 : 'whois_ip_1' on asset 'whois': Successfully released the lock '4'
Sep 18, 15:34:58 : 'whois_ip_1' on asset 'whois': 1 action succeeded. (1)For Parameter: {"ip":"20.106.254.65"} Message: "Registry: arin\nASN: 8075\nCountry: US\nNets:\nRange: 20.33.0.0 - 20.128.255.255\nAddress: One Microsoft Way"
Sep 18, 15:34:58 : 'whois_ip_1' on asset 'whois' completed with status: 'success'. Action Info: Size : 324 bytes : [{"app_name":"WHOIS","asset_name":"whois","param":{"ip": "20.106.254.65", "context": {"guid": "3f66c27b-f4d0-40fa-81e1-722b7e128707", "artifact_id": 207580, "parent_action_run": []}},"status":"success","message":"Registry: arin\nASN: 8075\nCountry: US\nNets:\nRange: 20.33.0.0 - 20.128.255.255\nAddress: One Microsoft Way"}]
Sep 18, 15:34:58 : calling action 'whois_ip_1''s callback function 'block_1()'
Sep 18, 15:34:58 : block_1() called
Sep 18, 15:34:58 : phantom.collect2(): called for datapath['artifact:*.cef.src'], scope: new and filter_artifacts: []
Sep 18, 15:34:58 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 15:34:58 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 1 NEW artifacts
Sep 18, 15:34:58 : phantom.collect2(): called for datapath['artifact:*.cef.src', 'artifact:*.id'], scope: None and filter_artifacts: None
Sep 18, 15:34:58 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 15:34:58 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 1 NEW artifacts
Sep 18, 15:34:58 : calling ph_engine.act() with effective parameters size: 169 bytes. effective handle size: 0 bytes
Sep 18, 15:34:58 : phantom.act(): action 'block' on assets: test_null_router_2, with no callback function, with no action reviewer, no delay to execute the action, user provided name for the action as 'block_1', no tags, no asset type
Sep 18, 15:34:58 : metrics: Playbook_id:996, run_id:121466, container: 114783, function: on_finish. TIME_TAKEN 116ms
Sep 18, 15:34:58 : finished action 'whois_ip_1''s callback function 'block_1()'
Sep 18, 15:34:58 : 1 action succeeded
Sep 18, 15:34:58 : 'block' (named as 'block_1') shall be executed now on asset 'test_null_router_2'(id:56) using app 'Test Box - NCSA BHR'
Sep 18, 15:34:58 : Running action (block_1) on AutomationBroker (splunkauto)
Sep 18, 15:34:59 : action '' did not have any callback. The action is now marked completed
Sep 18, 15:34:59 :
Playbook 'BHR Block DEV Test' (playbook id: 996) executed (playbook run id: 121466) on csoc-network-honeypot 'SOAR Honeypot Attacker Alert - 30min'(container id: 114783).
Playbook execution status is 'success'
Total actions executed: 2
Action 'whois_ip_1'(whois ip)
Status: success
App 'WHOIS' executed the action on asset 'whois'
Status: success
Parameter: {"ip":"20.106.254.65"}
Action 'block_1'(block)
Status: failed
App 'Test Box - NCSA BHR' executed the action on asset 'test_null_router_2'
Status: failed
Parameter: {"cidr":"20.106.254.65/32","duration":"300","source":"SOAR","why":"testing"}
App 'Test Box - NCSA BHR' executed the action on asset 'test_null_router_2'
Status: failed
Parameter: {"cidr":"20.106.254.65/32","duration":300,"source":"SOAR","why":"testing"}
Sep 18, 15:34:59 : on_finish() called
Sep 18, 15:34:59 : metrics: Playbook_id:996, run_id:121466, container: 114783, function: on_finish. TIME_TAKEN 7ms
Sep 18, 15:35:00 : *** Playbook 'local/BHR Block DEV Test' execution (121466) has completed with status: SUCCESS ***
Sep 18, 15:35:00 : 'block_1' on asset 'test_null_router_2': 2 actions failed
Sep 18, 15:35:00 : 'block_1' on asset 'test_null_router_2' completed with status: 'failed'. Action Info: Size : 705 bytes : [{"app_name":"Test Box - NCSA BHR","asset_name":"test_null_router_2","param":{"why": "testing", "cidr": "20.106.254.65/32", "source": "SOAR", "context": {"guid": "09e999a1-bbfa-499f-8aeb-7cb38bb0adbc", "artifact_id": 207580, "parent_action_run": []}, "duration": "300"},"status":"failed","message":""},{"app_name":"Test Box - NCSA BHR","asset_name":"test_null_router_2","param":{"why": "testing", "cidr": "20.106.254.65/32", "source": "SOAR", "context": {"guid": "09e999a1-bbfa-499f-8aeb-7cb38bb0adbc", "artifact_id": 207580, "parent_action_run": []}, "duration": 300},"status":"failed","message":"handle_action exception occurred. Error string: 'Required parameter 'autoscale' not found in dictionary'"}]
Sep 18, 15:35:00 : 1 action failed for app Test Box - NCSA BHR
undefined
Worth noting that block_1
and block_2
sometimes swap. That seems suspicious.
Some thoughts on next steps:
block
function in case that is causing issues.We could try to produce a 5.0.1 that eliminates the autoscale required
error message (by updating the get required paramater
call for a get optional parameter
call). We can test that in our Playbook BHR Block DEV Test
playbook against test_null_router_2
- on each of the splunkauto
and soarauto
and see what we get.
If that worked to clear the error that we were able to replicate, then we could have SecEng try the upgrade again with 5.0.1.
After patching for #53 - we get the same exact (totally unrelated?) error when connected to the wrong broker anyway...:
(We get this error on both brokers now...)
Sep 18, 16:07:26 : Starting playbook 'local/BHR Block DEV Test (id: 996, version: 10, pyversion: 3, scm id: 2)' on event '114783' with playbook run id: 121475, running as user '46' with scope 'new'
Sep 18, 16:07:26 : calling on_start() on csoc-network-honeypot 'SOAR Honeypot Attacker Alert - 30min'(id: 114783).
Sep 18, 16:07:26 : on_start() called
Sep 18, 16:07:27 : whois_ip_1() called
Sep 18, 16:07:27 : phantom.collect2(): called for datapath['artifact:*.cef.src'], scope: new and filter_artifacts: []
Sep 18, 16:07:27 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 16:07:27 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 0 NEW artifacts
Sep 18, 16:07:27 : phantom.collect2(): called for datapath['artifact:*.cef.src', 'artifact:*.id'], scope: None and filter_artifacts: None
Sep 18, 16:07:27 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 16:07:27 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 0 NEW artifacts
Sep 18, 16:07:27 : calling ph_engine.act() with effective parameters size: 106 bytes. effective handle size: 0 bytes
Sep 18, 16:07:27 : phantom.act(): action 'whois ip' on assets: whois, callback function: 'block_1', with no action reviewer, no delay to execute the action, user provided name for the action as 'whois_ip_1', no tags, no asset type
Sep 18, 16:07:27 : phantom.act(): 'whois_ip_1' cannot be run on asset 'whois'. The "whois ip" action requires the following parameters: ip. The given parameters look like they were automatically generated by phantom.act() because an empty parameters list was passed to phantom.act(). The parameters list may have been empty because the preceding call to phantom.collect2() returned an empty list. Check your calling code in the action that generated this error
Sep 18, 16:07:27 : phantom.act(): 'whois_ip_1' cannot be run on asset 'whois'. The "whois ip" action requires the following parameters: ip. The given parameters look like they were automatically generated by phantom.act() because an empty parameters list was passed to phantom.act(). The parameters list may have been empty because the preceding call to phantom.collect2() returned an empty list. Check your calling code in the action that generated this error
Sep 18, 16:07:27 : metrics: Playbook_id:996, run_id:121475, container: 114783, function: on_start. TIME_TAKEN 164ms
Sep 18, 16:07:27 : calling action 'whois_ip_1''s callback function 'block_1()'
Sep 18, 16:07:27 : block_1() called
Sep 18, 16:07:27 : phantom.collect2(): called for datapath['artifact:*.cef.src'], scope: new and filter_artifacts: []
Sep 18, 16:07:27 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 16:07:27 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 0 NEW artifacts
Sep 18, 16:07:27 : phantom.collect2(): called for datapath['artifact:*.cef.src', 'artifact:*.id'], scope: None and filter_artifacts: None
Sep 18, 16:07:27 : phantom.collect2(): Classified datapaths as [<DatapathClassification.ARTIFACT: 1>]
Sep 18, 16:07:27 : phantom.collect(): called with datapath as LIST of paths, scope='new' and limit=2000. Found 0 NEW artifacts
Sep 18, 16:07:27 : calling ph_engine.act() with effective parameters size: 106 bytes. effective handle size: 0 bytes
Sep 18, 16:07:27 : phantom.act(): action 'block' on assets: test_null_router_2, with no callback function, with no action reviewer, no delay to execute the action, user provided name for the action as 'block_1', no tags, no asset type
Sep 18, 16:07:27 : phantom.act(): 'block_1' cannot be run on asset 'test_null_router_2'. The "block" action requires the following parameters: cidr, why. The given parameters look like they were automatically generated by phantom.act() because an empty parameters list was passed to phantom.act(). The parameters list may have been empty because the preceding call to phantom.collect2() returned an empty list. Check your calling code in the action that generated this error
Sep 18, 16:07:27 : phantom.act(): 'block_1' cannot be run on asset 'test_null_router_2'. The "block" action requires the following parameters: cidr, why. The given parameters look like they were automatically generated by phantom.act() because an empty parameters list was passed to phantom.act(). The parameters list may have been empty because the preceding call to phantom.collect2() returned an empty list. Check your calling code in the action that generated this error
Sep 18, 16:07:27 : metrics: Playbook_id:996, run_id:121475, container: 114783, function: on_finish. TIME_TAKEN 117ms
Sep 18, 16:07:27 : finished action 'whois_ip_1''s callback function 'block_1()'
Sep 18, 16:07:27 :
Playbook 'BHR Block DEV Test' (playbook id: 996) executed (playbook run id: 121475) on csoc-network-honeypot 'SOAR Honeypot Attacker Alert - 30min'(container id: 114783).
Playbook execution status is 'failed'
Total actions executed: 2
Action 'whois_ip_1'(whois ip)
Status: failed
Action 'block_1'(block)
Status: failed
Sep 18, 16:07:27 : on_finish() called
Sep 18, 16:07:27 : metrics: Playbook_id:996, run_id:121475, container: 114783, function: on_finish. TIME_TAKEN 10ms
Sep 18, 16:07:27 : *** Playbook 'local/BHR Block DEV Test' execution (121475) has completed with status: FAILED ***
When we remove the suddenly-failing-oh-WHY call to whois
, we get a success on the IP/CIDR:
Sep 18, 16:15:52 : Starting playbook 'local/BHR Block DEV Test (id: 997, version: 11, pyversion: 3, scm id: 2)' on event '114783' with playbook run id: 121480, running as user '46' with scope 'new'
Sep 18, 16:15:52 : calling on_start() on csoc-network-honeypot 'SOAR Honeypot Attacker Alert - 30min'(id: 114783).
Sep 18, 16:15:52 : on_start() called
Sep 18, 16:15:52 : block_1() called
Sep 18, 16:15:52 : calling ph_engine.act() with effective parameters size: 85 bytes. effective handle size: 0 bytes
Sep 18, 16:15:52 : phantom.act(): action 'block' on assets: test_null_router_2, with no callback function, with no action reviewer, no delay to execute the action, user provided name for the action as 'block_1', no tags, no asset type
Sep 18, 16:15:52 : metrics: Playbook_id:997, run_id:121480, container: 114783, function: on_start. TIME_TAKEN 88ms
Sep 18, 16:15:52 : 'block' (named as 'block_1') shall be executed now on asset 'test_null_router_2'(id:56) using app 'Test Box - NCSA BHR'
Sep 18, 16:15:52 : Running action (block_1) on AutomationBroker (soarauto)
Sep 18, 16:15:53 : action '' did not have any callback. The action is now marked completed
Sep 18, 16:15:53 :
Playbook 'BHR Block DEV Test' (playbook id: 997) executed (playbook run id: 121480) on csoc-network-honeypot 'SOAR Honeypot Attacker Alert - 30min'(container id: 114783).
Playbook execution status is 'success'
Total actions executed: 1
Action 'block_1'(block)
Status: success
App 'Test Box - NCSA BHR' executed the action on asset 'test_null_router_2'
Status: success
Parameter: {"cidr":"20.106.254.65/32","duration":"300","source":"SOAR","why":"testing"}
Sep 18, 16:15:53 : on_finish() called
Sep 18, 16:15:53 : metrics: Playbook_id:997, run_id:121480, container: 114783, function: on_finish. TIME_TAKEN 8ms
Sep 18, 16:15:53 : *** Playbook 'local/BHR Block DEV Test' execution (121480) has completed with status: SUCCESS ***
Sep 18, 16:15:53 : 'block_1' on asset 'test_null_router_2': 1 action succeeded. (1)For Parameter: {"cidr":"20.106.254.65/32","context":{"artifact_id":0,"guid":"7e3d02e8-f73e-47dc-9bcb-977e10e8256d","parent_action_run":[]},"duration":"300","source":"SOAR","why":"testing"} Message: "Blocked 20.106.254.65/32"
Sep 18, 16:15:53 : 'block_1' on asset 'test_null_router_2' completed with status: 'success'. Action Info: Size : 322 bytes : [{"app_name":"Test Box - NCSA BHR","asset_name":"test_null_router_2","param":{"why": "testing", "cidr": "20.106.254.65/32", "source": "SOAR", "context": {"guid": "7e3d02e8-f73e-47dc-9bcb-977e10e8256d", "artifact_id": 0, "parent_action_run": []}, "duration": "300"},"status":"success","message":"Blocked 20.106.254.65/32"}]
Sep 18, 16:15:53 : 1 action succeeded
We did manage to replicate the error seen in production, and we managed to remove it (in a not very clean test...) via #53. So maybe that's our fix.
Best guess get_req_value
behaves differently on Python for RHEL9 vs Python for CENTOS?
We don't like this. It's weird.
Next up:
Let's do some testing to verify that our optional values are being treated as optional.
NOTE: If we don't trust get_req_value
, after our remaining testing, let's remove it entirely.
We have verified in the logs that, with our latest code, and our test playbook, the default values come through.
We placed the following debug line in our code:
self.debug_print(f"Block arguments: {args}")
Which produced the following debug output in the splunk SOAR logs:
2023-09-19T15:41:11.888619 PID:228487 TID:228487 : DEBUG: SPAWN3 : /srv/phantom/alchemy/connectors/spawn3/spawn3.cpp : 1021 : AppConnector :Block arguments: {'cidr': '20.106.254.65/32', 'why': 'testing', 'source': 'SOAR', 'duration': '300', 'autoscale': False}
This confirms that SOAR is setting the defaults appropriately.
We confirmed that adding the following line to our code produces the original error:
phantom.get_req_value(param, "autoscale")
Output from event run:
Sep 19, 11:01:09 : on_finish() called
Sep 19, 11:01:09 : metrics: Playbook_id:997, run_id:121670, container: 114783, function: on_finish. TIME_TAKEN 9ms
Sep 19, 11:01:09 : *** Playbook 'local/BHR Block DEV Test' execution (121670) has completed with status: FAILED ***
Sep 19, 11:01:09 : 'block_1' on asset 'test_null_router_2': 2 actions failed
Sep 19, 11:01:09 : 'block_1' on asset 'test_null_router_2' completed with status: 'failed'. Action Info: Size : 695 bytes : [{"app_name":"Test Box - NCSA BHR","asset_name":"test_null_router_2","param":{"why": "testing", "cidr": "20.106.254.65/32", "source": "SOAR", "context": {"guid": "8b924826-8420-41be-aad4-5cd9b815964d", "artifact_id": 0, "parent_action_run": []}, "duration": "300"},"status":"failed","message":""},{"app_name":"Test Box - NCSA BHR","asset_name":"test_null_router_2","param":{"why": "testing", "cidr": "20.106.254.65/32", "source": "SOAR", "context": {"guid": "8b924826-8420-41be-aad4-5cd9b815964d", "artifact_id": 0, "parent_action_run": []}, "duration": 300},"status":"failed","message":"handle_action exception occurred. Error string: 'Required parameter 'autoscale' not found in dictionary'"}]
Sep 19, 11:01:09 : 1 action failed for app Test Box - NCSA BHR
When the new automation broker running on RHEL9 is used for the null router SOAR app the block action fails with an error message:
This happened with 100% of the block attempts on the RHEL9 server. After roiling back to the old Centos server the block action stopped failing.
We may want to see if there is any place in our code where the autoscale parameter is being dropped, or see if there is a way in code to ensure it's always passed, although we have it set to false by default in the app.json.
Reference
Test Data
Playbook we have been testing with:
BHR Block DEV Test
, Event ID:114783
Some fields we have been testing with: