splunk / addonfactory-solutions-library-python

SDK for Developing Solutions in Splunk Enterprise with Python
https://splunk.github.io/addonfactory-solutions-library-python/
Apache License 2.0
17 stars 7 forks source link

Apps break on hosts where Splunk is configured to listen on IPv6 management port #334

Open alexhaydock opened 11 months ago

alexhaydock commented 11 months ago

For apps which invoke the get_splunkd_access_info() function from splunkenv.py, errors are thrown when a host is configured to listen using IPv6 on the Splunk MGMT port.

This function uses get_conf_key_value() to read the value of mgmtHostPort from web.conf, and splits it based on the : delimiter in order to work out the value being used for the MGMT port on the current instance:

https://github.com/splunk/addonfactory-solutions-library-python/blob/ed4749fa4cca8caabd3cc90ddd56b4a19824751d/solnlib/splunkenv.py#L187-L191

This obviously breaks when encountering IPv6 addresses, which use : as a delimiter as part of the address. This also breaks when deploying a Splunk-recommended configuration to listen on IPv6 as well as v4, which looks like this:

mgmtHostPort = [::]:8089

This type of construct is interestingly not included in the spec file for web.conf but in the spec file for server.conf, which states:

You might need to change the mgmtHostPort setting in the web.conf file. Use '[::1]' instead of '127.0.0.1'.

I'm not quite sure what the best way of fixing this is. It seems like using : as a delimiter should still be possible, but the function should take care to ensure that where multiple : delimiters are present, the rightmost value is taken as the integer used for the port value.

An example error from a Splunkbase app breaking because of this bug:

12-11-2023 17:07:25.671 +0000 ERROR AdminManagerExternal [109547 TcpChannelThread] - Stack trace from python handler:\nTraceback (most recent call last):\n  File "/opt/splunk/lib/python3.7/site-packages/splunk/admin.py", line 108, in init_persistent\n    hand = handler(mode, ctxInfo, data)\n  File "/opt/splunk/etc/apps/TA-sandfly-security/bin/ta_sandfly_security/aob_py3/splunktaucclib/rest_handler/admin_external.py", line 95, in __init__\n    get_splunkd_endpoint(),\n  File "/opt/splunk/etc/apps/TA-sandfly-security/bin/ta_sandfly_security/aob_py3/splunktaucclib/rest_handler/admin_external.py", line 77, in get_splunkd_endpoint\n    splunkd_uri = get_splunkd_uri()\n  File "/opt/splunk/etc/apps/TA-sandfly-security/bin/ta_sandfly_security/aob_py3/solnlib/splunkenv.py", line 208, in get_splunkd_uri\n    scheme, host, port = get_splunkd_access_info()\n  File "/opt/splunk/etc/apps/TA-sandfly-security/bin/ta_sandfly_security/aob_py3/solnlib/splunkenv.py", line 188, in get_splunkd_access_info\n    port = int(host_port.split(":")[1])\nValueError: invalid literal for int() with base 10: ''\n
artemrys commented 11 months ago

hey @alexhaydock, thanks for reporting this, we will take a look into it.

artemrys commented 11 months ago

@alexhaydock there are 2 PRs which I implemented which should have support for IPv6:

I followed docs from here.

The only difference I see so far is that you have [::]:8089 and I implemented the feature based on [::1]:8089 (from the docs above).

alexhaydock commented 11 months ago

Thanks for looking into this, @artemrys. Looking a little closer, I think you're right.

I've compared the app where I was facing this issue and the version of splunkenv.py in this repo and it looks like the issue here is that the app I'm trying to use was built with an older version of the add-on-builder before solnlib incorporated the fixes in your PRs above.

A diff between the one in the app and the one in this repo shows this off:

--- splunkenv.py    2023-11-01 23:47:04.825139976 +0000
+++ splunkenv.py.new    2023-12-13 14:33:23.223684840 +0000
@@ -1,11 +1,11 @@
 #
-# Copyright 2021 Splunk Inc.
+# Copyright 2023 Splunk Inc.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
-# http://www.apache.org/licenses/LICENSE-2.0
+#     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
@@ -24,6 +24,8 @@
 from io import StringIO
 from typing import List, Optional, Tuple, Union

+from .utils import is_true
+
 __all__ = [
     "make_splunkhome_path",
     "get_splunk_host_info",
@@ -177,15 +179,16 @@
         Tuple of (scheme, host, port).
     """

-    if get_conf_key_value("server", "sslConfig", "enableSplunkdSSL") == "true":
+    if is_true(get_conf_key_value("server", "sslConfig", "enableSplunkdSSL")):
         scheme = "https"
     else:
         scheme = "http"

     host_port = get_conf_key_value("web", "settings", "mgmtHostPort")
     host_port = host_port.strip()
-    host = host_port.split(":")[0]
-    port = int(host_port.split(":")[1])
+    host_port_split_parts = host_port.split(":")
+    host = ":".join(host_port_split_parts[:-1])
+    port = int(host_port_split_parts[-1])

     if "SPLUNK_BINDIP" in os.environ:
         bindip = os.environ["SPLUNK_BINDIP"]

I'll try and contact the app developer and see if they can update the app to backport the fixes from your PRs above, but I'm interested in whether there's any guidance for app developers published by Splunk that makes recommendations about how the app lifecycle should be managed to account for upstream AoB updates which fix bugs and issues such as this one?

alexhaydock commented 11 months ago

I've tried to patch the app in question by importing the whole solnlib directory from this repo, and I do think the changes in splunkenv.py above have fixed the first issue I was hitting but I'm still getting an error which I think is related to the logic here in net_utils.py:

https://github.com/splunk/addonfactory-solutions-library-python/blob/ed4749fa4cca8caabd3cc90ddd56b4a19824751d/solnlib/net_utils.py#L99

The error:

12-13-2023 14:52:51.561 +0000 ERROR AdminManagerExternal [478350 TcpChannelThread] - Stack trace from python handler:\nTraceback (most recent call last):\n  File "/opt/splunk/lib/python3.7/site-packages/splunk/admin.py", line 108, in init_persistent\n    hand = handler(mode, ctxInfo, data)\n  File "/opt/splunk/etc/apps/TA-sandfly-security/bin/ta_sandfly_security/aob_py3/splunktaucclib/rest_handler/admin_external.py", line 97, in __init__\n    self.endpoint,\n  File "/opt/splunk/etc/apps/TA-sandfly-security/bin/ta_sandfly_security/aob_py3/splunktaucclib/rest_handler/handler.py", line 150, in __init__\n    port=splunkd_info.port,\n  File "/opt/splunk/etc/apps/TA-sandfly-security/bin/ta_sandfly_security/aob_py3/solnlib/splunk_rest_client.py", line 216, in __init__\n    validate_scheme_host_port(scheme, host, port)\n  File "/opt/splunk/etc/apps/TA-sandfly-security/bin/ta_sandfly_security/aob_py3/solnlib/net_utils.py", line 153, in validate_scheme_host_port\n    raise ValueError("Invalid host")\nValueError: Invalid host\n

I notice that you mentioned the IPv6 support before was added based on the docs that reference [::1], and a quick glance at the code suggests that the net_utils.py logic is specifically looking for that address pattern.

For IPv6, listening on [::1] is the equivalent of listening on loopback, but if you want to listen on all interfaces to host something externally-accessible, like a deployment server, then [::] is the syntax. From a cursory glance at the code I can't tell specifically what is failing, but it should also be validated that it's capable of handling both [::], and also situations where a specific IPv6 address is being listened on too, for example: mgmtHostPort = [2001:0DB8::1234]:8089.

artemrys commented 11 months ago

I've tried to patch the app in question by importing the whole solnlib directory from this repo, and I do think the changes in splunkenv.py above have fixed the first issue I was hitting but I'm still getting an error which I think is related to the logic here in net_utils.py:

Please check out https://github.com/splunk/addonfactory-solutions-library-python/pull/227 as well, it should fix net_utils.py issue.

but I'm interested in whether there's any guidance for app developers published by Splunk that makes recommendations about how the app lifecycle should be managed to account for upstream AoB updates which fix bugs and issues such as this one?

I honestly don't know if something like this exists (especially in regards to AoB). I would recommend trying to update everything to the latest versions, we try not to make breaking changes for the users.

For IPv6, listening on [::1] is the equivalent of listening on loopback, but if you want to listen on all interfaces to host something externally-accessible, like a deployment server, then [::] is the syntax. From a cursory glance at the code I can't tell specifically what is failing, but it should also be validated that it's capable of handling both [::], and also situations where a specific IPv6 address is being listened on too, for example: mgmtHostPort = [2001:0DB8::1234]:8089.

I think it makes sense, I would need to read more documentation about it to have a more productive discussion 😃

alexhaydock commented 11 months ago

I would recommend trying to update everything to the latest versions, we try not to make breaking changes for the users.

Ah sorry for not being totally clear. I did actually update the entire solnlib directory in the app by replacing it with the current code in this repo and that seems to have fixed splunkenv.py, but not net_utils.py.

I looked at the changes in #227 and I think the problem is that they make the assumption that the only valid IPv6 listening address is going to be [::1]. That address is the equivalent of using 127.0.0.1, which is fine for a local MGMT port, but for something like a deployment server (or in my case an all-in-one deployment of Splunk) where you want the host to listen externally, the syntax in IPv6 notation is [::], which is the equivalent of 0.0.0.0.

Based on the changes in #227, I don't think that the code is set up to work appropriately with [::], as I think it is expecting to only see [::1].