Open tarelda opened 4 years ago
I deployed machine C that is basically machine A from different vendor. It has no other services running and doesn't exhibit the mentioned issue. All without sysctl tuning. Just stock Ubuntu 18.04. To further verify the issue existence I tested stream continuity on switch that machine A is connected to and issue is present there too.
Small followup. Recently I redeployed another docker based machine which also is used also for heavy traffic role (monitoring) and started suffering from similiar issues. What is interesting, change I made except from upgrading xenial to bionic was switching from OVS based networking to MacVlan. ATM working hipotesis is that MacVlan driver is a culprit in deployments where there are loads of traffic.
I am experiencing frame loss in my stream while processing it on machine A. I usually develop my solutions on machine B which with the same pipeline yields packet loss free stream. These are similiar supermicro servers connected to exactly the same brand and model switch. Difference are in CPUs (E5-2620v2 vs E5-2650) and RAM memory (32GB vs 64GB). Both run Ubuntu 18.04 with latest stable docker. In my setup tsduck runs in container that has interface binded via macvlan to vlan interface on physical interface (ethx.vid). How I found out about continuity loss? I run continuity plugin on client machine and I got information that every few seconds a few packets are lost mostly in video pid but also in audio. To further debug it I added continuity plugin into my pipeline just before -O ip. It shows no discontinuity. I checked interface counters on linux for any errors, discards, overflows etc. I used ethtool -S, netstat -i udp, /proc/net/snmp and /proc/[PID]/net/snmp. Nothing found there. But interestingly enough stats for switch interface that machine A uses shows some discards, but nowhere near discontinuity severity.
This switch's uplink is connected to switch which port's counters interestingly show some input errors:
Also I tried to: a) increase backlog, udp mem buffers and net buffers, but with no effect, b) play with ip output burst settings, but it only increased the problem.
Any ideas what I should look for?