Open ycoheNvidia opened 1 year ago
Verified with KVM, 1 high priority unreachable TACAC server will cause 9seconds delay, which is much faster than 40-50 seconds, this issue seems a platform related issue.
diff --git a/tests/tacacs/test_authorization.py b/tests/tacacs/test_authorization.py index 1e58776cf..46864bb57 100644 --- a/tests/tacacs/test_authorization.py +++ b/tests/tacacs/test_authorization.py @@ -30,7 +30,10 @@ def ssh_connect_remote_retry(remote_ip, remote_username, remote_password, duthos retry_count = 3 while retry_count > 0: try:
@@ -256,7 +259,7 @@ def test_authorization_tacacs_only_some_server_down( Setup multiple tacacs server for this UT. Tacacs server 127.0.0.1 not accessible. """
invalid_tacacs_server_ip = "127.0.0.1"
invalid_tacacs_server_ip = "123.4.5.6" duthost = duthosts[enum_rand_one_per_hwsku_hostname] tacacs_server_ip = ptfhost.mgmt_ip duthost.shell("sudo config tacacs timeout 1") @@ -264,21 +267,23 @@ def test_authorization_tacacs_only_some_server_down(
remove_all_tacacs_server(duthost)
duthost.shell("sudo config tacacs add %s" % invalid_tacacs_server_ip)
duthost.shell("sudo config tacacs add %s" % tacacs_server_ip)
duthost.shell("sudo config tacacs add %s -p 20" % invalid_tacacs_server_ip)
duthost.shell("sudo config tacacs add %s -p 10" % tacacs_server_ip)
duthost.shell("sudo truncate -s 0 /var/log/syslog")
logger.warning("Start check login time")
dutip = duthost.mgmt_ip
ssh_connection = ssh_connect_remote_retry(
dutip,
tacacs_creds['tacacs_rw_user'],
tacacs_creds['tacacs_rw_user_passwd'],
duthost
)
pytest_assert(ssh_connection != None)
res = duthost.shell("sudo cat /var/log/syslog | grep nss_tacplus")
logger.warning("End check login time: {}".format(res))
"""
We have encountered an issue while configuring and authenticating Tacacs servers. What is happening The scenario happens when a non active/non existent server (server A) is being configured with higher priority than another active server (server B), with aaa fallback and failthrough enabled. If we try to authenticate with a remote user defined in server B using ssh for example, we get significant wait times for connection to be established, such as 30-50 seconds each time. Additional research For user names that were created locally (like admin or others using useradd command) or authenticated and established before with a radius server - we did not encounter these delays. After examining the debug logs we suspect that the source of the issue is somewhere between linux pam and tacplus_pam, where while user connection pam calls tacacs server authentication multiple times, as it is checking user permissions - waiting full timeout for each check, In addition, when using a valid server as first priority to authenticate - we still see these multiple authentication requests logged in tacplus_pam and libnss tacplus libraries, but since there is no significant delay for each request- the session is established in a reasonable time (mostly less than 2 seconds).
We would like to know if this is a know limitation for TACACS in SONiC, since documentation in pam_tacplus library used by SONiC specifically states that only one active server is being used after first authentication (from https://github.com/kravietz/pam_tacplus/blob/main/README.md): "Having more that one TACACS+ server defined for given management group has following effects on authentication:
if the first server on the list is unreachable or failing pam_tacplus will try to authenticate the user against the other servers until it succeeds
the first_hit option has been deprecated
when the authentication function gets a positive reply from a server, it saves its address for future use by account management function (see below)
The account management (authorization) function asks only one TACACS+ server and it ignores the whole server list passed from command line. It uses server saved by authentication function after successful authenticating user on that server. We assume that the server is authoritative for queries about that user."
Reproduction steps: