Closed wen587 closed 2 years ago
ran the script in the description, found the error happened on the first patch step:
Mar 8 19:05:55.610196 vlab-01 NOTICE config: Patch Applier: The patch was sorted into 4 changes:
Mar 8 19:05:55.610673 vlab-01 NOTICE config: Patch Applier: * [{"op": "add", "path": "/ACL_RULE", "value": {"SSH_ONLY|TEST_DROP": {"IP_TYPE": "IP"}}}]
Mar 8 19:05:55.610760 vlab-01 NOTICE config: Patch Applier: * [{"op": "add", "path": "/ACL_RULE/SSH_ONLY|TEST_DROP/PRIORITY", "value": "9998"}]
Mar 8 19:05:55.611657 vlab-01 NOTICE config: Patch Applier: * [{"op": "add", "path": "/ACL_RULE/SSH_ONLY|TEST_DROP/PACKET_ACTION", "value": "DROP"}]
Mar 8 19:05:55.611757 vlab-01 NOTICE config: Patch Applier: * [{"op": "add", "path": "/ACL_RULE/SSH_ONLY|TEST_DROP/SRC_IP", "value": "9.9.9.9/32"}]
Mar 8 19:05:55.611865 vlab-01 NOTICE config: Patch Applier: Applying 4 changes in order:
Mar 8 19:05:55.611949 vlab-01 NOTICE config: Patch Applier: * [{"op": "add", "path": "/ACL_RULE", "value": {"SSH_ONLY|TEST_DROP": {"IP_TYPE": "IP"}}}]
Mar 8 19:05:55.975080 vlab-01 INFO caclmgrd[130303]: **** ACL TABLE has 0 entries.
Mar 8 19:05:55.977561 vlab-01 INFO caclmgrd[130303]: **** SSH_ONLY not found. Try again.
Mar 8 19:05:55.984697 vlab-01 INFO caclmgrd[130303]: Traceback (most recent call last):
Mar 8 19:05:55.988131 vlab-01 INFO caclmgrd[130303]: File "/usr/local/bin/caclmgrd", line 836, in run
Mar 8 19:05:55.988991 vlab-01 INFO caclmgrd[130303]: if acl_table_content[acl_table]["type"] == self.ACL_TABLE_TYPE_CTRLPLANE:
Mar 8 19:05:55.989041 vlab-01 INFO caclmgrd[130303]: KeyError: 'SSH_ONLY'
Mar 8 19:05:55.989085 vlab-01 INFO caclmgrd[130303]: During handling of the above exception, another exception occurred:
Mar 8 19:05:55.989109 vlab-01 INFO caclmgrd[130303]: Traceback (most recent call last):
Mar 8 19:05:55.989149 vlab-01 INFO caclmgrd[130303]: File "/usr/local/bin/caclmgrd", line 876, in <module>
Mar 8 19:05:55.992773 vlab-01 INFO caclmgrd[130303]: main()
Mar 8 19:05:55.993474 vlab-01 INFO caclmgrd[130303]: File "/usr/local/bin/caclmgrd", line 872, in main
Mar 8 19:05:55.994981 vlab-01 INFO caclmgrd[130303]: caclmgr.run()
Mar 8 19:05:55.995057 vlab-01 INFO caclmgrd[130303]: File "/usr/local/bin/caclmgrd", line 840, in run
Mar 8 19:05:55.995105 vlab-01 INFO caclmgrd[130303]: if self.config_db_map[namespace].get_table(self.ACL_TABLE)[acl_table]["type"] == self.ACL_TABLE_TYPE_CTRLPLANE:
Mar 8 19:05:55.995155 vlab-01 INFO caclmgrd[130303]: KeyError: 'SSH_ONLY'
Mar 8 19:05:56.318380 vlab-01 NOTICE config: Patch Applier: * [{"op": "add", "path": "/ACL_RULE/SSH_ONLY|TEST_DROP/PRIORITY", "value": "9998"}]
Mar 8 19:05:57.136246 vlab-01 NOTICE config: Patch Applier: * [{"op": "add", "path": "/ACL_RULE/SSH_ONLY|TEST_DROP/PACKET_ACTION", "value": "DROP"}]
Mar 8 19:05:57.790636 vlab-01 NOTICE config: Patch Applier: * [{"op": "add", "path": "/ACL_RULE/SSH_ONLY|TEST_DROP/SRC_IP", "value": "9.9.9.9/32"}]
From the logs
Jan 29 03:58:06.218187 vlab-01 ERR caclmgrd[2235692]: rule_props for rule_id TEST_DROP does not have key 'PRIORITY'!
It seems PRIORITY is required.
from caclmgr:
PRIORITY is unique for the same ACL
In the example below we add 3 ACL rules 2 of them have the same priority 9997, we see only 1 of them is not applied. This is verified by sudo iptables -S
admin@vlab-01:~$ cat add-acl-rule2.json
[
{
"op": "add",
"path": "/ACL_RULE",
"value": {
"SSH_ONLY|TEST_DROP1": {
"L4_DST_PORT": "22",
"IP_PROTOCOL": "17",
"IP_TYPE": "ANY",
"PACKET_ACTION": "DROP",
"PRIORITY": "9998",
"SRC_IP": "9.9.9.9/32"
},
"SSH_ONLY|TEST_DROP2": {
"L4_DST_PORT": "22",
"IP_PROTOCOL": "17",
"IP_TYPE": "ANY",
"PACKET_ACTION": "DROP",
"PRIORITY": "9997",
"SRC_IP": "10.10.10.10/32"
},
"SSH_ONLY|TEST_DROP3": {
"L4_DST_PORT": "22",
"IP_PROTOCOL": "17",
"IP_TYPE": "ANY",
"PACKET_ACTION": "DROP",
"PRIORITY": "9997",
"SRC_IP": "11.11.11.11/32"
}
}
}
]
admin@vlab-01:~$
admin@vlab-01:~$ sudo iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-A INPUT -s 127.0.0.1/32 -i lo -j ACCEPT
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -m icmp --icmp-type 8 -j ACCEPT
-A INPUT -p icmp -m icmp --icmp-type 0 -j ACCEPT
-A INPUT -p icmp -m icmp --icmp-type 3 -j ACCEPT
-A INPUT -p icmp -m icmp --icmp-type 11 -j ACCEPT
-A INPUT -p udp -m udp --dport 67:68 -j ACCEPT
-A INPUT -p udp -m udp --dport 546:547 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 179 -j ACCEPT
-A INPUT -p tcp -m tcp --sport 179 -j ACCEPT
-A INPUT -d 10.1.0.32/32 -j DROP
-A INPUT -d 10.250.0.0/32 -j DROP
-A INPUT -d 192.168.0.1/32 -j DROP
-A INPUT -d 10.0.0.56/32 -j DROP
-A INPUT -d 10.0.0.58/32 -j DROP
-A INPUT -d 10.0.0.60/32 -j DROP
-A INPUT -d 10.0.0.62/32 -j DROP
-A INPUT -m ttl --ttl-lt 2 -j ACCEPT
admin@vlab-01:~$ sudo config apply-patch add-acl-rule2.json -i ''
Patch Applier: Patch application starting.
...
Patch Applier: Verifying patch updates are reflected on ConfigDB.
Patch Applier: Patch application completed.
Patch applied successfully.
admin@vlab-01:~$ sudo iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-A INPUT -s 127.0.0.1/32 -i lo -j ACCEPT
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -m icmp --icmp-type 8 -j ACCEPT
-A INPUT -p icmp -m icmp --icmp-type 0 -j ACCEPT
-A INPUT -p icmp -m icmp --icmp-type 3 -j ACCEPT
-A INPUT -p icmp -m icmp --icmp-type 11 -j ACCEPT
-A INPUT -p udp -m udp --dport 67:68 -j ACCEPT
-A INPUT -p udp -m udp --dport 546:547 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 179 -j ACCEPT
-A INPUT -p tcp -m tcp --sport 179 -j ACCEPT
-A INPUT -s 9.9.9.9/32 -p tcp -m tcp --dport 22 -j DROP <====
-A INPUT -s 11.11.11.11/32 -p tcp -m tcp --dport 22 -j DROP <=====
-A INPUT -d 10.1.0.32/32 -j DROP
-A INPUT -d 10.250.0.0/32 -j DROP
-A INPUT -d 192.168.0.1/32 -j DROP
-A INPUT -d 10.0.0.56/32 -j DROP
-A INPUT -d 10.0.0.58/32 -j DROP
-A INPUT -d 10.0.0.60/32 -j DROP
-A INPUT -d 10.0.0.62/32 -j DROP
-A INPUT -m ttl --ttl-lt 2 -j ACCEPT
-A INPUT -j DROP
admin@vlab-01:~$
It crashes because during the update was divided into smaller steps, the first step was adding the IP_TYPE
without the PRIORITY
nor the PACKET_ACTION
.
Because caclmgr waits for 0.5s after each update for other updates. During this time we usually manage to also push PRIORITY
and PACKET_ACTION
.
We have to try multiple times to get the failure, because we need a case where the time between pushing IP_TYPE
and the other properties is > 0.5s.
Description
Case 1(described below): During continuous cacl test, caclmgrd may be broken and iptabels won't update. Case 2(during kvm test): There is a possibility that iptables is not immediately updated cuz caclmgrd
UPDATE_DELAY_SECS=0.5
. Refer test here: https://github.com/Azure/sonic-mgmt/blob/master/tests/generic_config_updater/test_cacl.py#L467 It will have a chance to fail which is already found in kvm nightly test. It check immediately afterapply-patch
success, while caclmgrd might upadte iptables in 0.5 seconds. Example syslog:Two thoughts I am thinking:
Steps to reproduce the issue
admin@vlab-01:~/cacl/test$ cat bug_repro.py
import subprocess import time import datetime def run_script():
add cacl config
for i in range(1, 100): print("iteration"+str(i)) ret= run_script() if ret== "fail": break
admin@vlab-01:~/cacl/test$ python verify.py iteration1 iteration2 ... iteration25 2022-01-29 03:58:09.802610 : fail to remove <====== -P INPUT ACCEPT -P FORWARD ACCEPT -P OUTPUT ACCEPT -A INPUT -s 127.0.0.1/32 -i lo -j ACCEPT -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A INPUT -p icmp -m icmp --icmp-type 8 -j ACCEPT -A INPUT -p icmp -m icmp --icmp-type 0 -j ACCEPT -A INPUT -p icmp -m icmp --icmp-type 3 -j ACCEPT -A INPUT -p icmp -m icmp --icmp-type 11 -j ACCEPT -A INPUT -p udp -m udp --dport 67:68 -j ACCEPT -A INPUT -p udp -m udp --dport 546:547 -j ACCEPT -A INPUT -p tcp -m tcp --dport 179 -j ACCEPT -A INPUT -p tcp -m tcp --sport 179 -j ACCEPT -A INPUT -s 9.9.9.9/32 -p tcp -m tcp --dport 22 -j DROP <===== -A INPUT -d 10.1.0.32/32 -j DROP -A INPUT -d 10.250.0.0/32 -j DROP -A INPUT -d 192.168.0.1/32 -j DROP -A INPUT -d 10.0.0.56/32 -j DROP -A INPUT -d 10.0.0.58/32 -j DROP -A INPUT -d 10.0.0.60/32 -j DROP -A INPUT -d 10.0.0.62/32 -j DROP -A INPUT -m ttl --ttl-lt 2 -j ACCEPT -A INPUT -j DROP
admin@vlab-01:~/cacl/test$ sudo cat /var/log/syslog | less Jan 29 03:58:04.375320 vlab-01 NOTICE config: Patch Applier: The patch was sorted into 5 changes: Jan 29 03:58:04.376123 vlab-01 NOTICE config: Patch Applier: [{"op": "remove", "path": "/ACL_RULE/SSH_ONLY|TEST_DROP/PACKET_ACTION"}] Jan 29 03:58:04.376229 vlab-01 NOTICE config: Patch Applier: [{"op": "remove", "path": "/ACL_RULE/SSH_ONLY|TEST_DROP/PRIORITY"}] Jan 29 03:58:04.376325 vlab-01 NOTICE config: Patch Applier: [{"op": "remove", "path": "/ACL_RULE/SSH_ONLY|TEST_DROP/SRC_IP"}] Jan 29 03:58:04.376399 vlab-01 NOTICE config: Patch Applier: [{"op": "remove", "path": "/ACL_RULE/SSH_ONLY|TEST_DROP/IP_TYPE"}] Jan 29 03:58:04.376473 vlab-01 NOTICE config: Patch Applier: [{"op": "remove", "path": "/ACL_RULE"}] Jan 29 03:58:04.376601 vlab-01 NOTICE config: Patch Applier: Applying 5 changes in order: Jan 29 03:58:04.376695 vlab-01 NOTICE config: Patch Applier: [{"op": "remove", "path": "/ACL_RULE/SSH_ONLY|TEST_DROP/PACKET_ACTION"}] Jan 29 03:58:04.705395 vlab-01 INFO caclmgrd[2235692]: ACL change detected for namespace '' Jan 29 03:58:04.705636 vlab-01 INFO caclmgrd[2235692]: Spawning ACL update thread for namepsace '' ... Jan 29 03:58:05.056248 vlab-01 NOTICE config: Patch Applier: [{"op": "remove", "path": "/ACL_RULE/SSH_ONLY|TEST_DROP/PRIORITY"}] Jan 29 03:58:05.206276 vlab-01 INFO caclmgrd[2235692]: ACL config not stable for namespace '': 1 changes detected in the past 0.5 seconds. Skipping update ... Jan 29 03:58:05.707468 vlab-01 INFO caclmgrd[2235692]: ACL config not stable for namespace '': 2 changes detected in the past 0.5 seconds. Skipping update ... Jan 29 03:58:05.847169 vlab-01 NOTICE config: Patch Applier: [{"op": "remove", "path": "/ACL_RULE/SSH_ONLY|TEST_DROP/SRC_IP"}] Jan 29 03:58:06.207341 vlab-01 INFO caclmgrd[2235692]: ACL config for namespace '' has not changed for 0.5 seconds. Applying updates ... Jan 29 03:58:06.217784 vlab-01 INFO caclmgrd[2235692]: Translating ACL rules for control plane ACL 'NTP_ACL' (service: 'NTP') Jan 29 03:58:06.217864 vlab-01 WARNING caclmgrd[2235692]: Unable to determine if ACL table 'NTP_ACL' contains IPv4 or IPv6 rules. Skipping table... Jan 29 03:58:06.217925 vlab-01 INFO caclmgrd[2235692]: Translating ACL rules for control plane ACL 'SNMP_ACL' (service: 'SNMP') Jan 29 03:58:06.218007 vlab-01 WARNING caclmgrd[2235692]: Unable to determine if ACL table 'SNMP_ACL' contains IPv4 or IPv6 rules. Skipping table... Jan 29 03:58:06.218104 vlab-01 INFO caclmgrd[2235692]: Translating ACL rules for control plane ACL 'SSH_ONLY' (service: 'SSH') Jan 29 03:58:06.218187 vlab-01 ERR caclmgrd[2235692]: rule_props for rule_id TEST_DROP does not have key 'PRIORITY'! Jan 29 03:58:06.218295 vlab-01 WARNING caclmgrd[2235692]: Unable to determine if ACL table 'SSH_ONLY' contains IPv4 or IPv6 rules. Skipping table... Jan 29 03:58:06.225719 vlab-01 INFO caclmgrd[2235692]: Traceback (most recent call last): Jan 29 03:58:06.225812 vlab-01 INFO caclmgrd[2235692]: File "/usr/local/bin/caclmgrd", line 866, in
Jan 29 03:58:06.226357 vlab-01 INFO caclmgrd[2235692]: main() <=========================
Jan 29 03:58:06.226406 vlab-01 INFO caclmgrd[2235692]: File "/usr/local/bin/caclmgrd", line 862, in main
Jan 29 03:58:06.226692 vlab-01 INFO caclmgrd[2235692]: caclmgr.run() <=========================
Jan 29 03:58:06.226734 vlab-01 INFO caclmgrd[2235692]: File "/usr/local/bin/caclmgrd", line 832, in run
Jan 29 03:58:06.226985 vlab-01 INFO caclmgrd[2235692]: if self.config_db_map[namespace].get_table(self.ACL_TABLE)[acl_table]["type"] == self.ACL_TABLE_TYPE_CTRLPLANE: <=========================
Jan 29 03:58:06.227040 vlab-01 INFO caclmgrd[2235692]: KeyError: 'SSH_ONLY'
Jan 29 03:58:06.613009 vlab-01 NOTICE config: Patch Applier: [{"op": "remove", "path": "/ACL_RULE/SSH_ONLY|TEST_DROP/IP_TYPE"}]
Jan 29 03:58:07.304973 vlab-01 NOTICE config: Patch Applier: [{"op": "remove", "path": "/ACL_RULE"}]
Jan 29 03:58:08.016759 vlab-01 NOTICE config: Patch Applier: Verifying patch updates are reflected on ConfigDB.
Jan 29 03:58:08.909793 vlab-01 NOTICE config: Patch Applier: Patch application completed.
(paste your output here)