Closed asyslinux closed 6 months ago
That's a lot of lighthouses. Why does your network have so many?
A few quick thoughts that might help - (1) In your host's relay.relays section, only list relays that are close to that host, in terms of ping time. Meaning, I expect European hosts would only list the European relays, and American hosts would only list American relays. Your American relays could even be further segmented, if they're in different geographic regions - so American east-coast hosts would only list relays on the east coast, and vice versa for west-coast hosts. I expect those geographic realities to result in lower latency, and therefore faster ping times. (2) In each host's config, specify
listen:
read_buffer: 10485760
write_buffer: 10485760
(these values come out of the commented-out values in the example Nebula config file here: https://github.com/slackhq/nebula/blob/master/examples/config.yml#L106)
If you hop into the OSS Nebula slack channel, you can get support there, too.
Hello thanks for reply.
I uncomment read/write buffers on all network hosts but this is didn`t help. Result is same, transfer file sometimes started fast, then stuck and continues with modem 56k speed, then can sometimes increase.
I try to set routines: 8 additionally. This is didn`t help.
I try to leave only 4 lighthouse nodes in USA in all network hosts, this is not help. And on previous version of Nebula: 1.5.2 when no had relays result has been same.
Is there any way to find out what could be the problem? Maybe change the mtu from 1290 to 1127, put it even lower? Or increase tx_queue from 500 to 3000?
What I know for sure is that the internet is fast between hosts from Asia and the US or Europe and the US.
Thanks.
50M.file
950,272 1% 791.81kB/s 0:01:05
983,040 1% 133.50kB/s 0:06:25
1,015,808 1% 73.94kB/s 0:11:35
1,048,576 2% 52.77kB/s 0:16:13
1,081,344 2% 6.10kB/s 2:20:24
1,343,488 2% 17.97kB/s 0:47:23
1,376,256 2% 18.18kB/s 0:46:48
3,080,192 5% 104.83kB/s 0:07:50
3,309,568 6% 104.76kB/s 0:07:48
3,342,336 6% 87.21kB/s 0:09:22
3,375,104 6% 86.32kB/s 0:09:28
3,407,872 6% 13.74kB/s 0:59:26
3,440,640 6% 5.14kB/s 2:38:47
3,473,408 6% 5.14kB/s 2:38:42
3,506,176 6% 5.14kB/s 2:38:34
3,538,944 6% 5.14kB/s 2:38:29
Additionally I attach my sysctl.conf (same on most servers in network) Maybe something in it interferes with the normal operation of the tunnels? Although there are no such problems between servers where ping is good, so I'm not sure if something is interfering.
#IP Forward
net/ipv4/ip_forward=1
#High Load Systems
net/ipv4/tcp_tw_reuse=1
#Disable ipv6
net/ipv6/conf/all/disable_ipv6=1
net/ipv6/conf/default/disable_ipv6=1
net/ipv6/conf/lo/disable_ipv6=1
#Max Concurent Connections
net/core/somaxconn=262144
#Disable Accept Source Routing
net/ipv4/conf/all/accept_source_route=0
#Disable Accept Redirects
net/ipv4/conf/all/accept_redirects=0
#Enable Anti Spoofing
net/ipv4/conf/all/rp_filter=1
#Enable Ignore Broadcast Packets
net/ipv4/icmp_echo_ignore_broadcasts=1
#Enable Logging Bad Error Message Protection
net/ipv4/icmp_ignore_bogus_error_responses=1
#Disable Logging Spoofes Packets, Source Routed Packets, Redirect Packets
net/ipv4/conf/all/log_martians=0
#Optimal Network Parameters
net/ipv4/tcp_congestion_control=yeah
net/core/netdev_max_backlog=262144
net/ipv4/tcp_no_metrics_save=1
net/ipv4/tcp_low_latency=1
net/ipv4/tcp_max_syn_backlog=262144
net/ipv4/tcp_mtu_probing=1
net/core/optmem_max=67108864
net/core/rmem_default=212992
net/core/wmem_default=212992
net/core/rmem_max=67108864
net/core/wmem_max=67108864
net/ipv4/tcp_rmem=4096 87380 33554432
net/ipv4/tcp_wmem=4096 65536 33554432
#Decrease TCP FIN TimeOut
net/ipv4/tcp_fin_timeout=3
#Decrease TCP KeepAlive Connections Interval
net/ipv4/tcp_keepalive_time=300
#Decrease TCP KeepAlive Sents
net/ipv4/tcp_keepalive_probes=3
#Disable SACK
net/ipv4/tcp_sack=0
#Time Orphan Retries
net/ipv4/tcp_orphan_retries=1
#Swap On 10% of Memory
vm/swappiness=10
#Core Pids
kernel/core_uses_pid=1
#Increase Inotify Settings
fs/inotify/max_user_watches=524288
fs/inotify/max_queued_events=65536
#Virtual Memory Settings
vm/overcommit_memory=1
vm/max_map_count=262144
#Auto-Reboot on Kernel Panic
kernel/panic=60
#Auto-Log on Kernel Panic
kernel/panic_on_oops=1
Hi @asyslinux - I realize this ticket is a bit stale, but I wanted to know if you made any progress in solving your issues.
One thing that was pointed out earlier in the thread is that relays can certainly act as a bottleneck, and you have quite a few configured in your host's configuration. Have you verified whether this issue exists when relays are taken out of the equation?
Hi, @johnmaguire - I do not use nebula now, problem early didn`t solved, my infrastructure no have any bottleneck, through direct connection all files transferred without any stucks.
You able close this issue, in past, I do not know this is only my problem or not.
Hello, I can't figure out what's wrong and why the transfer rate drops significantly to KB / s, and almost freezes. I tried to set lower mtu on two nodes(but not on all network nodes), but this is didn`t help. I tried to disable Europe lighthouse nodes in all network, but result same too. This problem i had on 1.5.2 version of Nebula too. Can anyone advise or check? Thank you.
Asia / Usa through Nebula:
root@sg:/dev# rsync -av --progress america.vpn.ip:/tmp/50M.file /tmp/ root@america.vpn.ip's password: receiving incremental file list 50M.file 7,634,944 14% 67.03kB/s 0:11:08 1,179,648 2% 71.06kB/s 0:12:01 1,277,952 2% 58.15kB/s 0:14:39 1,310,720 2% 9.79kB/s 1:27:00 1,966,080 3% 45.57kB/s 0:18:27 1,998,848 3% 39.42kB/s 0:21:19 2,064,384 3% 46.14kB/s 0:18:11 2,097,152 4% 49.54kB/s 0:16:56
Asia / Usa through Internet:
root@sg:/dev# rsync -av --progress america.real.ip:/tmp/50M.file /tmp/ root@america.real.ip's password: receiving incremental file list 50M.file 31,817,728 60% 6.58MB/s 0:00:03
With other transfers files from any continent to any continent in any direction - i have same problems.
I have 14 lighthouse nodes: 4 in Europe, 10 in America
Lighthouse configuration:
Others nodes configuration: