Open wayne-genie opened 4 years ago
@wayne-genie how are you computing the nProbe bps? Please note that there is a packet overhead (24 bytes - Preamble, SoF, CRC, IFG) which is usually accounted in link bandwidth utilization, but it is not part of the packets length as received by the application. This could explain the gap. Is it your case?
@cardigliano Yes, I think the header length could be the discrepancy here. We will go on to check if the snmp interface pps and flow pps match. Is there any case that nProbe not counting the packets into flow records if there's no filtering policy configured?
Hi @cardigliano we tested two cases and compare the pps stats with SNMP, found out if we use two nProbe processes, one for GTP-C and the other for GTP-U traffic, the flow pps stats is nearly the same as snmp (98%).
But when we switch to ZC and distribute traffic to 10 nProbe ZC processes, the pps stats drops to only 90 % of SNMP pps. Could you advise why there's such a decline using ZC?
nice -n -20 /usr/local/bin/nprobe -n ${COLLECTOR_PORT} -i ${ETH} -u ${NETFLOW_INDEX} -Q 0 -t 10 -d 15 -V 9 -o 100 -U 620 \
--cpu-affinity 4 --export-thread-affinity 5 \
--account-l2 \
-f "udp port 2152 or port 53" \
-b ${LOG_LEVEL} \
--tunnel \
--timestamp-format 1 \
--bi-directional \
--snaplen 0 \
-T \
"
%FIRST_SWITCHED %LAST_SWITCHED \
%FLOW_START_MILLISECONDS %FLOW_END_MILLISECONDS \
%IN_PKTS %IN_BYTES %IPV4_SRC_ADDR %IPV4_DST_ADDR %INPUT_SNMP %OUTPUT_SNMP %L4_SRC_PORT %L4_DST_PORT %TCP_FLAGS %PROTOCOL %SRC_TOS \
%BIFLOW_DIRECTION \
%DNS_QUERY %DNS_QUERY_ID %DNS_QUERY_TYPE %DNS_RET_CODE %DNS_NUM_ANSWERS \
%L7_PROTO %APPLICATION_ID \
%HTTP_HOST \
%UPSTREAM_TUNNEL_ID %DOWNSTREAM_TUNNEL_ID %UNTUNNELED_IPV4_SRC_ADDR %UNTUNNELED_IPV4_DST_ADDR %UNTUNNELED_PROTOCOL \
"
nice -n -20 /usr/local/bin/nprobe -n ${COLLECTOR_PORT} -i ${ETH} -u ${NETFLOW_INDEX} -Q 0 -t 10 -d 15 -V 9 -o 100 -U 610 \
--cpu-affinity 2 --export-thread-affinity 3 \
--account-l2 \
-f "udp port 2123" \
-b ${LOG_LEVEL} \
--tunnel \
--timestamp-format 1 \
--bi-directional \
--snaplen 0 \
-T \
"
%FIRST_SWITCHED %LAST_SWITCHED \
%FLOW_START_MILLISECONDS %FLOW_END_MILLISECONDS \
%IN_PKTS %IN_BYTES %IPV4_SRC_ADDR %IPV4_DST_ADDR %INPUT_SNMP %OUTPUT_SNMP %L4_SRC_PORT %L4_DST_PORT %TCP_FLAGS %PROTOCOL %SRC_TOS \
%BIFLOW_DIRECTION \
%L7_PROTO %APPLICATION_ID \
%UPSTREAM_TUNNEL_ID %DOWNSTREAM_TUNNEL_ID %UNTUNNELED_IPV4_SRC_ADDR %UNTUNNELED_IPV4_DST_ADDR %UNTUNNELED_PROTOCOL \
%GTPV1_REQ_MSG_TYPE %GTPV1_RSP_MSG_TYPE %GTPV1_C2S_TEID_DATA %GTPV1_C2S_TEID_CTRL %GTPV1_S2C_TEID_DATA %GTPV1_S2C_TEID_CTRL %GTPV1_END_USER_IP %GTPV1_END_USER_IMSI %GTPV1_END_USER_MSISDN %GTPV1_END_USER_IMEI %GTPV1_APN_NAME %GTPV1_RAI_MCC %GTPV1_RAI_MNC %GTPV1_RAI_LAC %GTPV1_RAI_RAC %GTPV1_ULI_MCC %GTPV1_ULI_MNC %GTPV1_ULI_CELL_LAC %GTPV1_ULI_CELL_CI %GTPV1_ULI_SAC %GTPV1_RESPONSE_CAUSE %GTPV1_RAT_TYPE \
%GTPV2_REQ_MSG_TYPE %GTPV2_RSP_MSG_TYPE %GTPV2_S5_S8_GTPC_TEID %GTPV2_C2S_S5_S8_GTPU_TEID %GTPV2_S2C_S5_S8_GTPU_TEID %GTPV2_C2S_S5_S8_GTPU_IP %GTPV2_S2C_S5_S8_GTPU_IP %GTPV2_END_USER_IMSI %GTPV2_END_USER_MSISDN %GTPV2_APN_NAME %GTPV2_ULI_MCC %GTPV2_ULI_MNC %GTPV2_ULI_CELL_TAC %GTPV2_ULI_CELL_ID %GTPV2_RESPONSE_CAUSE %GTPV2_RAT_TYPE %GTPV2_PDN_IP %GTPV2_END_USER_IMEI %GTPV2_C2S_S5_S8_GTPC_IP %GTPV2_S2C_S5_S8_GTPC_IP %GTPV2_C2S_S5_S8_SGW_GTPU_TEID %GTPV2_S2C_S5_S8_SGW_GTPU_TEID %GTPV2_C2S_S5_S8_SGW_GTPU_IP %GTPV2_S2C_S5_S8_SGW_GTPU_IP \
"
nice -n -20 /usr/local/bin/nprobe -n ${COLLECTOR_PORT} -i zc:10@0 -u ${NETFLOW_INDEX} -Q 0 -t 10 -d 15 -V 9 -o 100 -U 700 \
--cpu-affinity 2 --export-thread-affinity 3 \
--account-l2 \
-b 0 \
--tunnel \
--timestamp-format 1 \
--bi-directional \
--snaplen 0 \
-T \
"
%FIRST_SWITCHED %LAST_SWITCHED \
%FLOW_START_MILLISECONDS %FLOW_END_MILLISECONDS \
%IN_PKTS %IN_BYTES %IPV4_SRC_ADDR %IPV4_DST_ADDR %INPUT_SNMP %OUTPUT_SNMP %L4_SRC_PORT %L4_DST_PORT %TCP_FLAGS %PROTOCOL %SRC_TOS \
%BIFLOW_DIRECTION \
%DNS_QUERY %DNS_QUERY_ID %DNS_QUERY_TYPE %DNS_RET_CODE %DNS_NUM_ANSWERS \
%L7_PROTO %APPLICATION_ID \
%HTTP_HOST \
%UPSTREAM_TUNNEL_ID %DOWNSTREAM_TUNNEL_ID %UNTUNNELED_IPV4_SRC_ADDR %UNTUNNELED_IPV4_DST_ADDR %UNTUNNELED_PROTOCOL \
%GTPV1_REQ_MSG_TYPE %GTPV1_RSP_MSG_TYPE %GTPV1_C2S_TEID_DATA %GTPV1_C2S_TEID_CTRL %GTPV1_S2C_TEID_DATA %GTPV1_S2C_TEID_CTRL %GTPV1_END_USER_IP %GTPV1_END_USER_IMSI %GTPV1_END_USER_MSISDN %GTPV1_END_USER_IMEI %GTPV1_APN_NAME %GTPV1_RAI_MCC %GTPV1_RAI_MNC %GTPV1_RAI_LAC %GTPV1_RAI_RAC %GTPV1_ULI_MCC %GTPV1_ULI_MNC %GTPV1_ULI_CELL_LAC %GTPV1_ULI_CELL_CI %GTPV1_ULI_SAC %GTPV1_RESPONSE_CAUSE %GTPV1_RAT_TYPE \
%GTPV2_REQ_MSG_TYPE %GTPV2_RSP_MSG_TYPE %GTPV2_S5_S8_GTPC_TEID %GTPV2_C2S_S5_S8_GTPU_TEID %GTPV2_S2C_S5_S8_GTPU_TEID %GTPV2_C2S_S5_S8_GTPU_IP %GTPV2_S2C_S5_S8_GTPU_IP %GTPV2_END_USER_IMSI %GTPV2_END_USER_MSISDN %GTPV2_APN_NAME %GTPV2_ULI_MCC %GTPV2_ULI_MNC %GTPV2_ULI_CELL_TAC %GTPV2_ULI_CELL_ID %GTPV2_RESPONSE_CAUSE %GTPV2_RAT_TYPE %GTPV2_PDN_IP %GTPV2_END_USER_IMEI %GTPV2_C2S_S5_S8_GTPC_IP %GTPV2_S2C_S5_S8_GTPC_IP %GTPV2_C2S_S5_S8_SGW_GTPU_TEID %GTPV2_S2C_S5_S8_SGW_GTPU_TEID %GTPV2_C2S_S5_S8_SGW_GTPU_IP %GTPV2_S2C_S5_S8_SGW_GTPU_IP \
"
Below are the detailed log for the 10 ZC nProbe for reference: https://drive.google.com/open?id=1NWWw8KsstuTtINeTDl1niXJKrxDZQvXA
@wayne-genie we need to isolate the issue and figure out if it depends on processing both GTP-C and GTP-U in the same instance, or if it depends on using zc as capture method (I guess you are not using ZC in the first two configurations). Could you run the same tests without ZC? Or all tests with ZC if you experience packet loss? Did you check packet loss in all cases? Thank you.
@cardigliano There are no packet loss for the above test cases. We just tried disabling ZC with one nProbe processing GTP-C+GTP-U (config below), the pps stats is 99% same as snmp stats. Looks like enabling ZC causes more discrepancy. However we still need to distribute larger traffic with ZC at customer site. Anything we can fine tune to get more accurate stats? Thank you.
nice -n -20 /usr/local/bin/nprobe -n ${COLLECTOR_PORT} -i ${ETH} -u ${NETFLOW_INDEX} -Q 0 -t 10 -d 15 -V 9 -o 100 -U 600 \
--cpu-affinity 2 --export-thread-affinity 3 \
--account-l2 \
-b 0 \
--tunnel \
--timestamp-format 1 \
--bi-directional \
--snaplen 0 \
-T \
"
%FIRST_SWITCHED %LAST_SWITCHED \
%FLOW_START_MILLISECONDS %FLOW_END_MILLISECONDS \
%IN_PKTS %IN_BYTES %IPV4_SRC_ADDR %IPV4_DST_ADDR %INPUT_SNMP %OUTPUT_SNMP %L4_SRC_PORT %L4_DST_PORT %TCP_FLAGS %PROTOCOL %SRC_TOS \
%BIFLOW_DIRECTION \
%DNS_QUERY %DNS_QUERY_ID %DNS_QUERY_TYPE %DNS_RET_CODE %DNS_NUM_ANSWERS \
%L7_PROTO %APPLICATION_ID \
%HTTP_HOST \
%UPSTREAM_TUNNEL_ID %DOWNSTREAM_TUNNEL_ID %UNTUNNELED_IPV4_SRC_ADDR %UNTUNNELED_IPV4_DST_ADDR %UNTUNNELED_PROTOCOL \
%GTPV1_REQ_MSG_TYPE %GTPV1_RSP_MSG_TYPE %GTPV1_C2S_TEID_DATA %GTPV1_C2S_TEID_CTRL %GTPV1_S2C_TEID_DATA %GTPV1_S2C_TEID_CTRL %GTPV1_END_USER_IP %GTPV1_END_USER_IMSI %GTPV1_END_USER_MSISDN %GTPV1_END_USER_IMEI %GTPV1_APN_NAME %GTPV1_RAI_MCC %GTPV1_RAI_MNC %GTPV1_RAI_LAC %GTPV1_RAI_RAC %GTPV1_ULI_MCC %GTPV1_ULI_MNC %GTPV1_ULI_CELL_LAC %GTPV1_ULI_CELL_CI %GTPV1_ULI_SAC %GTPV1_RESPONSE_CAUSE %GTPV1_RAT_TYPE \
%GTPV2_REQ_MSG_TYPE %GTPV2_RSP_MSG_TYPE %GTPV2_S5_S8_GTPC_TEID %GTPV2_C2S_S5_S8_GTPU_TEID %GTPV2_S2C_S5_S8_GTPU_TEID %GTPV2_C2S_S5_S8_GTPU_IP %GTPV2_S2C_S5_S8_GTPU_IP %GTPV2_END_USER_IMSI %GTPV2_END_USER_MSISDN %GTPV2_APN_NAME %GTPV2_ULI_MCC %GTPV2_ULI_MNC %GTPV2_ULI_CELL_TAC %GTPV2_ULI_CELL_ID %GTPV2_RESPONSE_CAUSE %GTPV2_RAT_TYPE %GTPV2_PDN_IP %GTPV2_END_USER_IMEI %GTPV2_C2S_S5_S8_GTPC_IP %GTPV2_S2C_S5_S8_GTPC_IP %GTPV2_C2S_S5_S8_SGW_GTPU_TEID %GTPV2_S2C_S5_S8_SGW_GTPU_TEID %GTPV2_C2S_S5_S8_SGW_GTPU_IP %GTPV2_S2C_S5_S8_SGW_GTPU_IP \
"
@wayne-genie could you try running pfcount with and without zc to check if there is any discrepancy in the packet capture itself?
Just disabled the nProbe and doing pfcount for ZC, I got this "Unable to enable ring" error. Is there anything I am not doing correct here?
This usually happens when you have other applicaitons running on the same interface (I see you have zbalance_ipc running)
OK removing zbalance_ipc and I can get pfcount for zc interface. The pkt/sec stats is around 200k and is approximately identical to snmp pps stats at the same time. pfcount stats is good by far. Please advise what else I can check next, thank you. ZC.log
Hi @cardigliano, any advice on this issue?
@wayne-genie let's ignore bps for a moment (as it may depend on the packet overhead/headers) and focus on pps. You said that pps stats reported by pfcount matches what you see on the switch, this means that there is some loss when you run nprobe. Could you check both zbalance_ipc and nprobe stats? Please take a look at /proc/net/pf_ring/stats/*
Here is the /proc/net/pf_ring/stats/* log info, looks like there's packet drop in some zc?
[BEGIN] 2020/6/10 ¤W¤È 10:12:43
root@nProbe:/proc/net/pf_ring/stats# ifconfig eth4
eth4 Link encap:Ethernet HWaddr a0:36:9f:ed:d3:98
inet6 addr: fe80::a236:9fff:feed:d398/64 Scope:Link
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:393691287094 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:274352963860673 (274.3 TB) TX bytes:680 (680.0 B)
Memory:c8100000-c8200000
root@nProbe:/proc/net/pf_ring/stats# ifconfig eth4
eth4 Link encap:Ethernet HWaddr a0:36:9f:ed:d3:98
inet6 addr: fe80::a236:9fff:feed:d398/64 Scope:Link
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:393692270020 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:274353612774571 (274.3 TB) TX bytes:680 (680.0 B)
Memory:c8100000-c8200000
root@nProbe:/proc/net/pf_ring/stats# pwd
/proc/net/pf_ring/stats
root@nProbe:/proc/net/pf_ring/stats# ls -al
total 0
dr-xr-xr-x 2 root root 0 Jun 10 10:04 .
dr-xr-xr-x 4 root root 0 Jun 10 10:01 ..
-r--r--r-- 1 root root 0 Jun 10 10:09 10237-none.2
-r--r--r-- 1 root root 0 Jun 10 10:09 10247-none.4
-r--r--r-- 1 root root 0 Jun 10 10:09 10266-none.6
-r--r--r-- 1 root root 0 Jun 10 10:09 10285-none.8
-r--r--r-- 1 root root 0 Jun 10 10:09 10304-none.10
-r--r--r-- 1 root root 0 Jun 10 10:09 10323-none.12
-r--r--r-- 1 root root 0 Jun 10 10:09 10342-none.14
-r--r--r-- 1 root root 0 Jun 10 10:09 10361-none.16
-r--r--r-- 1 root root 0 Jun 10 10:09 10380-none.18
-r--r--r-- 1 root root 0 Jun 10 10:09 10401-none.20
-r--r--r-- 1 root root 0 Jun 10 10:09 10420-none.22
root@nProbe:/proc/net/pf_ring/stats# cat *none*
ClusterId: 10
TotQueues: 10
Applications: 1
App0Queues: 10
Duration: 21:22:21:22:642
Packets: 393699884311
Forwarded: 393693639910
Processed: 393693558010
Duration: 21:22:21:18:585
Bytes: 27443561015792
Packets: 40748495657
Dropped: 7128
Duration: 21:22:21:13:556
Bytes: 27246987378309
Packets: 38801962627
Dropped: 0
Duration: 21:22:21:07:795
Bytes: 27271485523709
Packets: 39187615395
Dropped: 1960
Duration: 21:22:21:02:771
Bytes: 27123363826998
Packets: 38983063293
Dropped: 12242
Duration: 21:22:20:57:747
Bytes: 27112526608186
Packets: 38763914496
Dropped: 0
Duration: 21:22:20:52:724
Bytes: 27303213647229
Packets: 39351811710
Dropped: 0
Duration: 21:22:20:47:704
Bytes: 27289532070484
Packets: 39446457921
Dropped: 0
Duration: 21:22:20:42:682
Bytes: 27374596227132
Packets: 39474819458
Dropped: 843
Duration: 21:22:20:38:384
Bytes: 27258735756870
Packets: 39622825246
Dropped: 4439
Duration: 21:22:20:33:357
Bytes: 27355627842233
Packets: 39312623444
Dropped: 0
root@nProbe:/proc/net/pf_ring/stats# cat *none*
ClusterId: 10
TotQueues: 10
Applications: 1
App0Queues: 10
Duration: 21:22:21:29:642
Packets: 393701469545
Forwarded: 393695225144
Processed: 393695143240
Duration: 21:22:21:25:585
Bytes: 27443660715336
Packets: 40748663761
Dropped: 7128
Duration: 21:22:21:20:556
Bytes: 27247078735361
Packets: 38802100091
Dropped: 0
Duration: 21:22:21:14:796
Bytes: 27271587666471
Packets: 39187764504
Dropped: 1960
Duration: 21:22:21:09:772
Bytes: 27123460082715
Packets: 38983202074
Dropped: 12242
Duration: 21:22:21:04:748
Bytes: 27112636475266
Packets: 38764066170
Dropped: 0
Duration: 21:22:20:59:725
Bytes: 27303317041509
Packets: 39351967237
Dropped: 0
Duration: 21:22:20:54:705
Bytes: 27289641765332
Packets: 39446640907
Dropped: 0
Duration: 21:22:20:49:683
Bytes: 27374707561179
Packets: 39475013663
Dropped: 843
Duration: 21:22:20:45:384
Bytes: 27258829804451
Packets: 39622966859
Dropped: 4439
Duration: 21:22:20:40:358
Bytes: 27355728978261
Packets: 39312784340
Dropped: 0
[END] 2020/6/10 ¤W¤È 10:14:12
@wayne-genie could you provide the content of all files in /proc/net/pf_ring/stats (not just none) with the file name included? Or even better, are you able to run zbalance_ipc manually, with -p, and provide some output? What we should do (probably you can also do this yourself) is to check that received + dropped packets matches what you see on snmp.
@cardigliano I only see none. files under /proc/net/pf_ring/stats (as seen below). Also the zbalance_ipc -p output for about 1.5 hrs are also attached below. The zbalance_ipc Actual Stats: Recv and Forwarded pps are pretty close to the snmp at the same time, for example both around 240-260 kpps at 15:00.
[BEGIN] 2020/6/11 ¤W¤È 11:11:05
root@nProbe:/proc/net/pf_ring/stats# ls -al
total 0
dr-xr-xr-x 2 root root 0 Jun 11 11:06 .
dr-xr-xr-x 4 root root 0 Jun 11 11:06 ..
-r--r--r-- 1 root root 0 Jun 11 11:06 156810-none.25
-r--r--r-- 1 root root 0 Jun 11 11:06 156819-none.27
-r--r--r-- 1 root root 0 Jun 11 11:06 156838-none.29
-r--r--r-- 1 root root 0 Jun 11 11:06 156857-none.31
-r--r--r-- 1 root root 0 Jun 11 11:06 156876-none.33
-r--r--r-- 1 root root 0 Jun 11 11:06 156895-none.35
-r--r--r-- 1 root root 0 Jun 11 11:06 156914-none.37
-r--r--r-- 1 root root 0 Jun 11 11:06 156933-none.39
-r--r--r-- 1 root root 0 Jun 11 11:06 156952-none.41
-r--r--r-- 1 root root 0 Jun 11 11:06 156971-none.43
-r--r--r-- 1 root root 0 Jun 11 11:06 156990-none.45
root@nProbe:/proc/net/pf_ring/stats# pwd
/proc/net/pf_ring/stats
root@nProbe:/proc/net/pf_ring/stats# cat *.25
ClusterId: 10
TotQueues: 10
Applications: 1
App0Queues: 10
Duration: 0:01:00:52:382
Packets: 827081292
Forwarded: 821108652
Processed: 821026752
IFPackets: 827081310
IFDropped: 0
Q0Packets: 86750818
Q0Dropped: 97658
Q1Packets: 83785256
Q1Dropped: 211786
Q2Packets: 84856028
Q2Dropped: 336714
Q3Packets: 83404870
Q3Dropped: 398185
Q4Packets: 81263987
Q4Dropped: 549542
Q5Packets: 77743915
Q5Dropped: 656642
Q6Packets: 76935796
Q6Dropped: 760341
Q7Packets: 87086244
Q7Dropped: 931203
Q8Packets: 79009230
Q8Dropped: 989718
Q9Packets: 80190610
Q9Dropped: 1040843
root@nProbe:/proc/net/pf_ring/stats# cat *.27
Duration: 0:01:00:58:232
Bytes: 56575490723
Packets: 86988392
Dropped: 0
root@nProbe:/proc/net/pf_ring/stats# cat *.29
Duration: 0:01:00:56:216
Bytes: 57127689309
Packets: 84128694
Dropped: 0
root@nProbe:/proc/net/pf_ring/stats# cat *.31
Duration: 0:01:00:54:197
Bytes: 58086840494
Packets: 85191406
Dropped: 0
root@nProbe:/proc/net/pf_ring/stats# cat *.33
Duration: 0:01:00:52:200
Bytes: 55955288918
Packets: 83992843
Dropped: 0
root@nProbe:/proc/net/pf_ring/stats# cat *.35
Duration: 0:01:00:48:199
Bytes: 56551059927
Packets: 81739894
Dropped: 0
root@nProbe:/proc/net/pf_ring/stats# cat *.37
Duration: 0:01:00:46:201
Bytes: 53600538593
Packets: 78238115
Dropped: 0
root@nProbe:/proc/net/pf_ring/stats# cat *.39
Duration: 0:01:00:43:199
Bytes: 52149994664
Packets: 77489153
Dropped: 0
root@nProbe:/proc/net/pf_ring/stats# cat *.41
Duration: 0:01:00:49:197
Bytes: 57689397981
Packets: 88061146
Dropped: 0
root@nProbe:/proc/net/pf_ring/stats# cat *.43
Duration: 0:01:00:47:213
Bytes: 54164272168
Packets: 79887822
Dropped: 0
root@nProbe:/proc/net/pf_ring/stats# cat *.45
Duration: 0:01:00:44:212
Bytes: 55297812243
Packets: 81150252
Dropped: 0
[END] 2020/6/11 ¤W¤È 11:12:23
This is the snmp stats at the same time:
Hi @cardigliano, it looks like zbalance_ipc works well according to the above info. What should I check next? Thank you.
Ok great, the number of packets received by ZC and forwarded to nProbe matches SNMP. Now, "Forwarded" traffic is what nProbe is actually processing, thus it is strange the number of packets nProbe is reporting does not match SNMP. How are you counting those packets from nprobe? Are you looking at live stats (where?) or exported flows?
By exported flows (sampling 1:1:1). We use the IN_PKTS to calculate pps and compare the snmp stats at the same time.
@wayne-genie did you take into account non-ip traffic?
@cardigliano this interface is dedicated to GTP roaming traffic, there's maybe small portion of non-ip traffic. However from previous test cases, disabling ZC (case 1 and case 2 below) is 98-99% of snmp pps stats, a reasonable discrepancy if taking into account non-ip traffic.
In case 3, once we distribute traffic to 10 nprobe ZC, the flow pps stats drops to only 90% of snmp pps stats.
That's weird cause we just confirmed zbalance_ipc has forwarded without packet loss.
[Case 1]: two nProbe processes without ZC, one for GTP-C and the other for GTP-U traffic, the flow pps stats is 98% the same as snmp. [Case 2]: one nProbe without ZC processing GTP-C+GTP-U, the pps stats is 99% same as snmp stats. [Case 3]: distribute traffic to 10 nProbe ZC processing GTP-C+GTP-U, the pps stats drops to only 90 % of SNMP pps.
Hi @cardigliano any suggestion on this?
@wayne-genie since stats about received traffic seems correct with ZC, we need to figure out what nProbe is doing in that case (weird that there are differences in traffic processing depending on the source, ZC vs non ZC). Any chance you can provide access to the system for debugging, or a (big) pcap file with some traffic that we can use to reproduce this in our lab?
Hi @cardigliano sorry for the late reply. We managed to record a large GTP traffic raw packet pcap(around 8GB), please give it a try with it.
pcap file: https://drive.google.com/file/d/17DsaDSZrTFYH1TV2MztpR08t28JTQRXP/view?usp=sharing
@wayne-genie thank you, I downloaded the pcap, I will analyse it asap
@wayne-genie I tried to reproduce the issue with your pcap, both analysing the pcapa directly in nprobe and replaying the pcap and capturing through ZC (and zbalance), in both cases nProbe accounts ~11,240M packets (IN_PKTS + OUT_PKTS) which is ~0,997% of the packets in the pcap
Hi @cardigliano my colleague was also unable to reproduce it in lab and we noticed on customer site, the issue happened after the zbalance and nprobe were up and running for 2-3 days. Anything I can do to identify the root cause? Thank you.
It's hard to identify this kind of issue without a pcap or a way to be able to reproduce it..
Thank you @cardigliano for your help. We will reopen it if reproduce-able.
Hello @cardigliano, hope you and @lucaderi are doing well and safe.
Our client is still concerned about the nProbe pps stats not quite consistent with the SNMP. Now we are able to remote access the client's nProbe device thru Anycast or TeamViewer while my colleague is on site (typically around 7-10 am Italy time). If it's okay for you, please let me know when is good for you to remote access and help identify what causes the discrepancy. Thank you.
Description:
We found out for one interface the total bps statistics of the nProbe is only about 70% of SNMP polling stats (no packet drops for nProbe). Other interfaces are 90% consistent. Would like to seek advice if the nProbe config needs to modify, or other possible causes for the discrepancy.
nProbe Config: