jamesog / tailscale-edgeos

Running Tailscale on Ubiquiti EdgeOS
MIT License
327 stars 29 forks source link

Doesn't reestablish routes to TailScale upon reboot? #1

Closed LunkSnee closed 2 years ago

LunkSnee commented 3 years ago

I followed your guide on my EdgeRouter X (using MIPSLE), and got it working! Wonderful!

Until I rebooted. Upon rebooting, it appears to report being connected to the tailscale service and it lists my other nodes, but no routing occurs. Devices can't ping or connect to the ERX.

Is there anything you would recommend I try to resolve this or give you more information? Thanks.

For example, after reboot running a netcheck and status checks:

$ tailscale netcheck                                               

Report:                                                                         
        * UDP: true                                                             
        * IPv4: yes, m.n.o.p:6346                                       
        * IPv6: no                                                              
        * MappingVariesByDestIP: false                                          
        * HairPinning: false                                                    
        * PortMapping:                                                          
        * Nearest DERP: Seattle                                                 
        * DERP latency:                                                         
                - sea: 46.7ms  (Seattle)                                        
                - sfo: 64.8ms  (San Francisco)                                  
                - dfw: 96.6ms  (Dallas)                                         
                - nyc: 115.5ms (New York City)                                  
                - tok: 177.3ms (Tokyo)                                          
                - lhr: 177.8ms (London)                                         
                - fra: 198.9ms (Frankfurt)                                      
                - syd: 226.1ms (Sydney)                                         
                - sin: 250.4ms (Singapore)                                      
                - blr: 298.8ms (Bangalore)                                      
$ tailscale status                                                 
a.b.c.d    ubnt                 admin@         linux   -                     
...
e.f.g.h         s21                  admin@         android -                     
jamesog commented 3 years ago

Thanks for that information, that's a really helpful report. I don't reboot my ER4 often so I guess I've yet to discover this 😄

Devices can't ping or connect to the ERX.

Can the ERX connect to other Tailscale devices?

What happens if you tailscale up again?

LunkSnee commented 3 years ago

Running tailscale up again doesn't fix it.

Messing around with it, I've found that running systemctl restart tailscaled DOES fix it.

Perhaps tailscaled is starting too early and not setting up the right routing?

I changed the tailscaled.service script to wait for network-online.target instead of network-pre, but that didn't fix my issue.

jamesog commented 3 years ago

Ah, that's interesting!

One possible workaround is to put an ExecStartPre in the unit to ping the default gateway or similar, until/unless a proper dependency can be figured out.

I won't be able to test this until the weekend at the earliest, unfortunately.

LunkSnee commented 3 years ago

Ah, that's interesting!

One possible workaround is to put an ExecStartPre in the unit to ping the default gateway or similar, until/unless a proper dependency can be figured out.

I won't be able to test this until the weekend at the earliest, unfortunately.

That's fine. I've fixed it for me for the time being by adding the following file (chmod +x): /config/scripts/post-config.d/tailscale-restart.sh

#!/bin/vbash
echo "$(date +'%h %e %T')" `hostname` 'tailscaled restarted' >> /var/log/messages
systemctl restart tailscaled

exit 0

Scripts in that directory run after boot is completed, based on EdgeOS documentation. Tailscale connects

jamesog commented 3 years ago

I've been digging into this a little bit. It's definitely weird.

I modified the systemd unit to start after vyatta-routing.service:

# cat /etc/systemd/system/tailscaled.service.d/after-vyatta-routing.conf
[Unit]
After=vyatta-router.service

Just in case it was missing routing information. That does make it start later, but doesn't fix anything. To try and see what it's doing I bumped up logging in the daemon:

# grep FLAGS /etc/default/tailscaled
FLAGS="-verbose 2"

which also required enabling journald:

root@testgw01:/home/ubnt# cat /etc/systemd/journald.conf.d/00-edgeos-defaults.conf
[Journal]
Storage=volatile

(Vyatta default is Storage=none. volatile means it only keeps logs in RAM.)

After that I can see the daemon is doing "things" and netstat shows it's definitely connected to Tailscale's control service, but yet it's not really connected. I wonder if this needs reporting as a bug upstream, but I'll keep poking this system to see if there's some other way I can convince it to work on startup.

One other thing, via #2 I'm looking into switching from manually installing the binaries to using the Debian repo provided by Tailscale, as that seems to work.

jamesog commented 3 years ago

I was wondering if maybe there's a problem with the overlay filesystem and something isn't available early enough, so I took at lsof output:

root@testgw01:/home/ubnt# lsof -c tailscaled
COMMAND    PID USER   FD      TYPE             DEVICE SIZE/OFF  NODE NAME
tailscale 1571 root  cwd       DIR               0,13     4096    23 /
tailscale 1571 root  rtd       DIR               0,13     4096    23 /
tailscale 1571 root  txt       REG                8,2 12853861 76419 /usr/sbin/tailscaled
tailscale 1571 root    0r      CHR                1,3      0t0  1028 /dev/null
tailscale 1571 root    1u     unix 0x80000004171ccd80      0t0 10178 type=STREAM
tailscale 1571 root    2u     unix 0x80000004171ccd80      0t0 10178 type=STREAM
tailscale 1571 root    3u      REG                8,2        0 76435 /var/lib/tailscale/tailscaled.log1.txt
tailscale 1571 root    4u  a_inode                0,9        0     6 [eventpoll]
tailscale 1571 root    5r     FIFO                0,8      0t0 10184 pipe
tailscale 1571 root    6w     FIFO                0,8      0t0 10184 pipe
tailscale 1571 root    7u      REG                8,2        0 76436 /var/lib/tailscale/tailscaled.log2.txt
tailscale 1571 root    8u  netlink                         0t0 10201 ROUTE
tailscale 1571 root    9u      CHR             10,200     0t80  1427 /dev/net/tun
tailscale 1571 root   10u     unix 0x800000041ddad680      0t0 12407 /run/tailscale/tailscaled.sock type=STREAM
tailscale 1571 root   12u     IPv4              11337      0t0   TCP 10.1.1.2:46030->ec2-34-210-105-16.us-west-2.compute.amazonaws.com:https (ESTABLISHED)
tailscale 1571 root   13u  netlink                         0t0 11701 ROUTE
tailscale 1571 root   14r     FIFO                0,8      0t0 11702 pipe
tailscale 1571 root   15u  netlink                         0t0 10214 ROUTE
tailscale 1571 root   16r     FIFO                0,8      0t0 10215 pipe
tailscale 1571 root   17w     FIFO                0,8      0t0 10215 pipe
tailscale 1571 root   18w     FIFO                0,8      0t0 11702 pipe
tailscale 1571 root   19u     unix 0x80000004171ce880      0t0 11707 type=DGRAM
tailscale 1571 root   20u     IPv4              11712      0t0   TCP 10.1.1.2:49874->ec2-3-121-18-47.eu-central-1.compute.amazonaws.com:https (ESTABLISHED)

Compared to when this is working, it's missing connection to a DERP relay, and it's not binding to UDP/41641:

root@testgw01:/home/ubnt# systemctl restart tailscaled
Warning: tailscaled.service changed on disk. Run 'systemctl daemon-reload' to reload units.
root@testgw01:/home/ubnt# lsof -c tailscaled
COMMAND    PID USER   FD      TYPE             DEVICE SIZE/OFF  NODE NAME
tailscale 2003 root  cwd       DIR               0,13     4096    23 /
tailscale 2003 root  rtd       DIR               0,13     4096    23 /
tailscale 2003 root  txt       REG                8,2 12853861 76419 /usr/sbin/tailscaled
tailscale 2003 root    0r      CHR                1,3      0t0  1028 /dev/null
tailscale 2003 root    1u     unix 0x800000041c783a80      0t0 14971 type=STREAM
tailscale 2003 root    2u     unix 0x800000041c783a80      0t0 14971 type=STREAM
tailscale 2003 root    3u      REG                8,2      121 76435 /var/lib/tailscale/tailscaled.log1.txt
tailscale 2003 root    4u  a_inode                0,9        0     6 [eventpoll]
tailscale 2003 root    5r     FIFO                0,8      0t0 14011 pipe
tailscale 2003 root    6w     FIFO                0,8      0t0 14011 pipe
tailscale 2003 root    7u      REG                8,2        0 76436 /var/lib/tailscale/tailscaled.log2.txt
tailscale 2003 root    8u  netlink                         0t0 14997 ROUTE
tailscale 2003 root    9u     unix 0x80000004171cc900      0t0 14102 type=DGRAM
tailscale 2003 root   10u      CHR             10,200     0t80  1427 /dev/net/tun
tailscale 2003 root   11u  netlink                         0t0 15013 ROUTE
tailscale 2003 root   12r     FIFO                0,8      0t0 15014 pipe
tailscale 2003 root   13w     FIFO                0,8      0t0 15014 pipe
tailscale 2003 root   14u     IPv4              15044      0t0   UDP *:41641
tailscale 2003 root   15u     IPv6              15045      0t0   UDP *:41641
tailscale 2003 root   16u  netlink                         0t0 14041 ROUTE
tailscale 2003 root   17r     FIFO                0,8      0t0 14042 pipe
tailscale 2003 root   18w     FIFO                0,8      0t0 14042 pipe
tailscale 2003 root   19u     IPv4              15152      0t0   TCP 10.1.1.2:49888->ec2-3-121-18-47.eu-central-1.compute.amazonaws.com:https (ESTABLISHED)
tailscale 2003 root   20u     unix 0x80000004171ce880      0t0 14099 /run/tailscale/tailscaled.sock type=STREAM
tailscale 2003 root   23u     IPv4              15179      0t0   TCP 10.1.1.2:42062->derp8-lon.tailscale.com:https (ESTABLISHED)
tailscale 2003 root   24u     IPv4              15067      0t0   TCP 10.1.1.2:46044->ec2-34-210-105-16.us-west-2.compute.amazonaws.com:https (ESTABLISHED)
tailscale 2003 root   25r     FIFO                0,8      0t0 14146 pipe

I can't see anything else obvious in system logs :-( but I'm pretty sure the problem is the lack of binding.

jamesog commented 3 years ago

I've raised a bug with Tailscale to see if they can help figure out why. I've tried several things to no avail.

jamesog commented 3 years ago

Hey @LunkSnee, I've recently upgraded Tailscale to 1.14 on my EdgeRouter and it looks like Tailscale is working properly now. Are you able to update (if you didn't already) and confirm if it works for you now? If so I'll update the ticket with Tailscale.

jamesog commented 2 years ago

I'll close this out now. I haven't had problems with this not working for a while. I hope it's working for you too.