osrg / gobgp

BGP implemented in the Go Programming Language
https://osrg.github.io/gobgp/
Apache License 2.0
3.64k stars 693 forks source link

gobgpd hangs when adding many paths with graceful restart enabled #1818

Closed ecbaldwin closed 6 years ago

ecbaldwin commented 6 years ago

The switch I'm working with can handle 512k host routes in its fib and so I was trying to push that many with gobgp. Without GR enabled, I can easily push this many routes. However, when I started using GR or LLGR, gobgpd would hang.

I tried three methods:

  1. AddPathStream with many paths per AddPathStreamRequest.
  2. AddPathStream with one Path per AddPathStreamRequest.
  3. AddPath in a loop.

I observed the first one hanging before it hit 100 routes. The second seems to hang after pushing around 700 - 800 routes to the peer. The last one actually gets pretty far. I've seen it get over 300,000 before hanging. Of course, it is about 3 times slower that the other methods.

When gobgpd is hung, it does not respond to API requests nor a SIGTERM. A SIGINT will kill it.

Below is the script I used to generate the routes. You'll see all three methods in the add_paths function. Just some of them are commented out.

#!/root/python-gobgp/venv/bin/python

import attribute_pb2
import gobgp_pb2_grpc
import gobgp_pb2
from google.protobuf import any_pb2
import sys

import grpc
import netaddr
from grpc.framework.interfaces.face.face import ExpirationError

_TIMEOUT_SECONDS = 1

def wrap_any(thing):
    a = any_pb2.Any()
    a.Pack(thing)
    return a

def create_ipv4_path(prefix, prefix_len, next_hop):
    nlri = attribute_pb2.IPAddressPrefix(prefix=prefix, prefix_len=int(prefix_len))
    origin_attr = attribute_pb2.OriginAttribute(origin=0)
    next_hop_attr = attribute_pb2.NextHopAttribute(next_hop=next_hop)

    return gobgp_pb2.Path(family=gobgp_pb2.IPv4,
                          any_nlri=wrap_any(nlri),
                          any_pattrs=[wrap_any(origin_attr),
                                      wrap_any(next_hop_attr)])

def create_ipv6_path(prefix, prefix_len, next_hop):
    nlri = attribute_pb2.IPAddressPrefix(prefix=prefix, prefix_len=int(prefix_len))
    origin_attr = attribute_pb2.OriginAttribute(origin=0)
    mp_reach_nlri_attr = attribute_pb2.MpReachNLRIAttribute(family=gobgp_pb2.IPv6,
                                                            next_hops=[next_hop],
                                                            nlris=[wrap_any(nlri)])

    return gobgp_pb2.Path(family=gobgp_pb2.IPv6,
                          any_nlri=wrap_any(attribute_pb2.IPAddressPrefix(prefix=prefix, prefix_len=int(prefix_len))),
                          any_pattrs=[wrap_any(origin_attr),
                                      wrap_any(mp_reach_nlri_attr)])

def create_host_route(host, next_hop):
    host = netaddr.IPAddress(host)
    next_hop = netaddr.IPAddress(next_hop)

    if host.version != next_hop.version:
        raise "Mixed IP versions"

    if host.version == 4:
        return create_ipv4_path(str(host), 32, str(next_hop))

    return create_ipv6_path(str(host), 128, str(next_hop))

def add_paths(stub, paths):
    def generate_one_request_per_route():
        for path in paths:
            yield gobgp_pb2.AddPathStreamRequest(paths=[path])

    def generate_one_request():
        yield gobgp_pb2.AddPathStreamRequest(paths=paths)

    # The stream API seems to hang sometimes
    stub.AddPathStream(generate_one_request_per_route())

    # Add this is slower
    # for path in paths:
    #     stub.AddPath(gobgp_pb2.AddPathRequest(path=path))

def del_paths(stub, paths):
    for path in paths:
        stub.DeletePath(gobgp_pb2.DeletePathRequest(path=path))

def run(gobgpd_addr):
    channel = grpc.insecure_channel(gobgpd_addr + ':50051')
    stub = gobgp_pb2_grpc.GobgpApiStub(channel)

    paths = []

    print("Calculating paths")
    # 2048 address per next hop, 256 next hops.
    public_addrs = iter(netaddr.IPNetwork("172.24.0.0/13"))
    next_hops = iter(netaddr.IPNetwork("1.0.0.0/24"))
    for _ in xrange(0, 255):
        next_hop = next_hops.next()
        for _ in xrange(0, 2048):
            paths.append(create_host_route(public_addrs.next(), next_hop))

    if 1 < len(sys.argv) and "del" in sys.argv[1]:
        print("Deleting Paths")
        del_paths(stub, paths)
    else:
        print("Adding Paths")
        add_paths(stub, paths)

if __name__ == '__main__':
    run("[::]")
ecbaldwin commented 6 years ago

This is with 008c961ecd943739c5db63fcd931904804a45aa5.

ecbaldwin commented 6 years ago

I looked at this a little bit on Friday. I can reproduce it pretty easily on a VM by adding two addresses, 10.0.0.1/32 and 10.0.0.2/32 onto lo like this:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet 10.0.0.1/32 scope global lo
       valid_lft forever preferred_lft forever
    inet 10.0.0.2/32 scope global lo
       valid_lft forever preferred_lft forever

Then, I run two instances of gobgp. The first has this config.

[global.config]
    as = 4200000001
    router-id = "10.0.0.1"
    local-address-list = ["10.0.0.1"]

[[neighbors]]
    [neighbors.config]
        neighbor-address = "10.0.0.2"
        peer-as = 4200000002
    [neighbors.transport.config]
        local-address = "10.0.0.1"
    [neighbors.ebgp-multihop.config]
        enabled = true
        multihop-ttl = 16
    [neighbors.graceful-restart.config]
        enabled = true
        notification-enabled = true
        long-lived-enabled = true
        restart-time = 120
    [[neighbors.afi-safis]]
        [neighbors.afi-safis.config]
        afi-safi-name = "ipv4-unicast"
        [neighbors.afi-safis.mp-graceful-restart.config]
            enabled = true
        [neighbors.afi-safis.long-lived-graceful-restart.config]
            enabled = true
            restart-time = 43200

The other instance has essentially the same config but with the IPs and ASNs swapped.

There is contention around the peer.fsm.lock semaphore. Each time I reproduce, I see a goroutine stuck trying to obtain a write lock.

goroutine 27 [semacquire, 5 minutes]:
sync.runtime_Semacquire(0xc4202aaa80)
    /usr/local/go/src/runtime/sema.go:56 +0x39
sync.(*RWMutex).Lock(0xc4202aaa78)
    /usr/local/go/src/sync/rwmutex.go:98 +0x6e
github.com/osrg/gobgp/pkg/server.(*FSM).bgpMessageStateUpdate(0xc4202aaa20, 0xc420f50002)
    /root/go/src/github.com/osrg/gobgp/pkg/server/fsm.go:220 +0x47
github.com/osrg/gobgp/pkg/server.(*FSMHandler).sendMessageloop.func1(0xc420f20750, 0x1, 0x1)
    /root/go/src/github.com/osrg/gobgp/pkg/server/fsm.go:1617 +0xad7
github.com/osrg/gobgp/pkg/server.(*FSMHandler).sendMessageloop(0xc4201ed360, 0xc4202aaa78, 0xc4200ae160)
    /root/go/src/github.com/osrg/gobgp/pkg/server/fsm.go:1685 +0x10f
github.com/osrg/gobgp/pkg/server.(*FSMHandler).(github.com/osrg/gobgp/pkg/server.sendMessageloop)-fm(0x9a7000, 0x2)
    /root/go/src/github.com/osrg/gobgp/pkg/server/fsm.go:1724 +0x2a
github.com/osrg/gobgp/vendor/gopkg.in/tomb%2ev2.(*Tomb).run(0xc4201ed360, 0xc4200a1aa0)
    /root/go/src/github.com/osrg/gobgp/vendor/gopkg.in/tomb.v2/tomb.go:163 +0x2b
created by github.com/osrg/gobgp/vendor/gopkg.in/tomb%2ev2.(*Tomb).Go
    /root/go/src/github.com/osrg/gobgp/vendor/gopkg.in/tomb.v2/tomb.go:159 +0xb9

I haven't been able to tell yet which thread is hanging onto its locking resulting in this deadlock. I think I've been able to rule out another write lock because I instrumented all of the calls that acquire a write lock on this semaphore and I don't see anything acquiring and not releasing it. So, it must be one of the many read locks being acquired and then not being released. I looked at other goroutines which are waiting on locks but nothing seemed like an obvious deadlock culprit. I'll look for some better ways to instrument the code and figure out what is going on.

ecbaldwin commented 6 years ago

I seem to notice a thread like the following each time I reproduce the problem. Since this seems to happen when graceful restart is enabled, I'm think it might have something to do with it. I'll look through the stack trace to see if I can find something:

goroutine 29 [semacquire, 5 minutes]:
sync.runtime_Semacquire(0xc4202aaa84)
    /usr/local/go/src/runtime/sema.go:56 +0x39
sync.(*RWMutex).RLock(0xc4202aaa78)
    /usr/local/go/src/sync/rwmutex.go:50 +0x49
github.com/osrg/gobgp/pkg/server.(*Peer).llgrFamilies(0xc4202c6300, 0xc3b908, 0xc4202aaa78, 0x970d03, 0xc420b2c0f0, 0xc420b2c101, 0x1)
    /root/go/src/github.com/osrg/gobgp/pkg/server/peer.go:272 +0x4d
github.com/osrg/gobgp/pkg/server.(*Peer).isLLGREnabledFamily(0xc4202c6300, 0x10001, 0x0)
    /root/go/src/github.com/osrg/gobgp/pkg/server/peer.go:289 +0x93
github.com/osrg/gobgp/pkg/server.(*BgpServer).filterpath(0xc42022c000, 0xc4202c6300, 0xc42128d900, 0x0, 0x0)
    /root/go/src/github.com/osrg/gobgp/pkg/server/server.go:589 +0x3b3
github.com/osrg/gobgp/pkg/server.(*BgpServer).processOutgoingPaths(0xc42022c000, 0xc4202c6300, 0xc42000ebe0, 0x1, 0x1, 0xc42000ebe8, 0x1, 0x1, 0x1, 0xc42000ebe8, ...)
    /root/go/src/github.com/osrg/gobgp/pkg/server/server.go:877 +0x166
github.com/osrg/gobgp/pkg/server.(*BgpServer).propagateUpdateToNeighbors(0xc42022c000, 0x0, 0xc420b2e050, 0xc42000ebc8, 0x1, 0x1, 0xc420b2e001)
    /root/go/src/github.com/osrg/gobgp/pkg/server/server.go:1127 +0x32d
github.com/osrg/gobgp/pkg/server.(*BgpServer).propagateUpdate(0xc42022c000, 0x0, 0xc420acc000, 0x2710, 0x2710)
    /root/go/src/github.com/osrg/gobgp/pkg/server/server.go:1026 +0x258
github.com/osrg/gobgp/pkg/server.(*BgpServer).addPathList(0xc42022c000, 0x0, 0x0, 0xc420acc000, 0x2710, 0x2710, 0x2710, 0x0)
    /root/go/src/github.com/osrg/gobgp/pkg/server/server.go:1823 +0xd5
github.com/osrg/gobgp/pkg/server.(*Server).AddPathStream(0xc42000c1e0, 0xcb4300, 0xc4200a1b80, 0xcb8cc0, 0xc42000c1e0)
    /root/go/src/github.com/osrg/gobgp/pkg/server/grpc_server.go:792 +0xf2
github.com/osrg/gobgp/api._GobgpApi_AddPathStream_Handler(0xc00520, 0xc42000c1e0, 0xcb33a0, 0xc420206120, 0x0, 0x0)
    /root/go/src/github.com/osrg/gobgp/api/gobgp.pb.go:7104 +0xb2
github.com/osrg/gobgp/vendor/google.golang.org/grpc.(*Server).processStreamingRPC(0xc42022e000, 0xcb4000, 0xc4202198c0, 0xc42015e500, 0xc420166b10, 0x121a620, 0x0, 0x0, 0x0)
    /root/go/src/github.com/osrg/gobgp/vendor/google.golang.org/grpc/server.go:892 +0x8a5
github.com/osrg/gobgp/vendor/google.golang.org/grpc.(*Server).handleStream(0xc42022e000, 0xcb4000, 0xc4202198c0, 0xc42015e500, 0x0)
    /root/go/src/github.com/osrg/gobgp/vendor/google.golang.org/grpc/server.go:986 +0x12b1
github.com/osrg/gobgp/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc420272e10, 0xc42022e000, 0xcb4000, 0xc4202198c0, 0xc42015e500)
    /root/go/src/github.com/osrg/gobgp/vendor/google.golang.org/grpc/server.go:551 +0x9f
created by github.com/osrg/gobgp/vendor/google.golang.org/grpc.(*Server).serveStreams.func1
    /root/go/src/github.com/osrg/gobgp/vendor/google.golang.org/grpc/server.go:549 +0xa1
ecbaldwin commented 6 years ago

I think I found a smoking gun in the last trace that I pasted above. First, isLLGREnabledFamily grabs a read lock on the semaphore with a defer statement to release it on exiting the method. It then calls llgrFamilies which attempts to grab a read lock on its own.

ecbaldwin commented 6 years ago

The above PR is a naive change that shows that this is in fact the issue. I am able to get through announcing over 1mm routes without any deadlock issues. I'll work on the PR and get it into shape to merge.

ecbaldwin commented 6 years ago

To make it more explicit what is happening here, I quote the docs for the sync package:

RLock locks rw for reading.

It should not be used for recursive read locking; a blocked Lock call excludes new readers from acquiring the lock. See the documentation on the RWMutex type.

This deadlock happens when the bgpMessageStateUpdate thread blocks for a write lock in the middle of the recursive read lock in the isLLGREnabledFamily thread.

ecbaldwin commented 6 years ago

I think #1821 is in better shape now. I consider it a good enough fix for the issue in the near term.

I think a better long-term way to handle this would be to rearchitect the sharing of the fsm, fsm.pConf, and whatever else is under synchronization with this lock. IMO, a single goroutine should be responsible for managing updates to fsm and should push updated copies of it to other goroutines through channels which will pick up the updates by selecting on the channel.

My observation is that updates to the fsm do not happen that frequently in the grand scheme of things. Also, I don't see any harm in another goroutine getting an update to the fsm after finishing current operations. My opinion is that it would be more consistent for it to wait to get an update to the state by selecting on the updates channel.

fujita commented 6 years ago

Again, thanks a lot for fixing this. Let's discuss the long-term solution on the different issue, as I commented on the pr.