osrg / gobgp

BGP implemented in the Go Programming Language
https://osrg.github.io/gobgp/
Apache License 2.0
3.59k stars 684 forks source link

GoBGP crash during RFC7747 convergance tests #2173

Open mszbattle opened 4 years ago

mszbattle commented 4 years ago

Hello goBGP Community,

I would like to share with you information about the daemon crash during the convergence tests performed using IXIA IxNetwork Web Edition ver. 9.00 in accordance with RFC7747. The tests consist of assessing the convergence time for sending various amounts of prefixes to the goBGP daemon, e.g. 4,000,8000,16000,32000,64000,500000,600000,700000,800000. When the test begins to test the 500,000 pool, the goBGP daemon stops working (see logs below). So far, I've tested the last three versions of the 2.9.0, 2.8.0 and 2.7.0 daemon (binary package amd64) with the same effect.

Environment configuration: Ubuntu 18.04 , daemon goBGP intergrated with zebra (guagga)

goBGP config:

[global.config] as = 300 router-id = "1.1.1.1" [[neighbors]] [neighbors.config] neighbor-address = "100.198.198.1" peer-as = 100

[[neighbors]] [neighbors.config] neighbor-address = "101.198.198.1" peer-as = 101 [zebra] [zebra.config] enabled = true url = "unix:/var/run/quagga/zserv.api" redistribute-route-type-list = ["connect"] version = 2 Logs before test mszancer@goBGP:~$ sudo systemctl status gobgpd

Oct 02 11:02:37 goBGP gobgpd[2005]: {"level":"info","msg":"gobgpd started","time":"2019-10-02T11:02:37+02:00"} Oct 02 11:02:37 goBGP gobgpd[2005]: {"Topic":"Config","level":"info","msg":"Finished reading the config file","time":"2019-10-02T11:02:37+02:00"} Oct 02 11:02:37 goBGP gobgpd[2005]: {"Error":"read unix @-\u003e/run/quagga/zserv.api: read: connection reset by peer","Topic":"Zebra","level":"error","msg":"failed to read header","time":"2019-10-02T11:02:37+02:00"} Oct 02 11:02:37 goBGP gobgpd[2005]: {"Topic":"Zebra","level":"warning","msg":"cannot connect to Zebra with message version 2.","time":"2019-10-02T11:02:37+02:00"} Oct 02 11:02:37 goBGP gobgpd[2005]: {"Topic":"Zebra","level":"warning","msg":"going to retry another version 3.","time":"2019-10-02T11:02:37+02:00"} Oct 02 11:02:37 goBGP gobgpd[2005]: {"Topic":"Zebra","level":"info","msg":"success to connect to Zebra with message version 3.","time":"2019-10-02T11:02:37+02:00"} Oct 02 11:02:37 goBGP gobgpd[2005]: {"level":"info","msg":"Peer 100.198.198.1 is added","time":"2019-10-02T11:02:37+02:00"} Oct 02 11:02:37 goBGP gobgpd[2005]: {"Topic":"Peer","level":"info","msg":"Add a peer configuration for:100.198.198.1","time":"2019-10-02T11:02:37+02:00"} Oct 02 11:02:37 goBGP gobgpd[2005]: {"level":"info","msg":"Peer 101.198.198.1 is added","time":"2019-10-02T11:02:37+02:00"} Oct 02 11:02:37 goBGP gobgpd[2005]: {"Topic":"Peer","level":"info","msg":"Add a peer configuration for:101.198.198.1","time":"2019-10-02T11:02:37+02:00"}

Logs after test

Logs: test@goBGP:~$ sudo systemctl status gobgpd

Oct 01 08:57:53 goBGP gobgpd[30697]: /home/runner/go/pkg/mod/github.com/eapache/channels@v1.1.0/infinite_channel.go:49 +0x137 Oct 01 08:57:53 goBGP gobgpd[30697]: created by github.com/eapache/channels.NewInfiniteChannel Oct 01 08:57:53 goBGP gobgpd[30697]: /home/runner/go/pkg/mod/github.com/eapache/channels@v1.1.0/infinite_channel.go:19 +0x159 Oct 01 08:57:53 goBGP gobgpd[30697]: goroutine 33 [chan send]: Oct 01 08:57:53 goBGP gobgpd[30697]: github.com/osrg/gobgp/pkg/server.(watcher).loop(0xc0000b1300) Oct 01 08:57:53 goBGP gobgpd[30697]: /home/runner/work/gobgp/gobgp/pkg/server/server.go:3972 +0x64 Oct 01 08:57:53 goBGP gobgpd[30697]: created by github.com/osrg/gobgp/pkg/server.(BgpServer).watch.func1 Oct 01 08:57:53 goBGP gobgpd[30697]: /home/runner/work/gobgp/gobgp/pkg/server/server.go:4148 +0xa9f Oct 01 08:57:53 goBGP systemd[1]: gobgpd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT Oct 01 08:57:53 goBGP systemd[1]: gobgpd.service: Failed with result 'exit-code'.

test@goBGP:~$ sudo systemctl status gobgpd

Oct 01 09:18:18 goBGP gobgpd[525]: /home/runner/go/pkg/mod/github.com/eapache/channels@v1.1.0/infinite_channel.go:49 +0x137 Oct 01 09:18:18 goBGP gobgpd[525]: created by github.com/eapache/channels.NewInfiniteChannel Oct 01 09:18:18 goBGP gobgpd[525]: /home/runner/go/pkg/mod/github.com/eapache/channels@v1.1.0/infinite_channel.go:19 +0x159 Oct 01 09:18:18 goBGP gobgpd[525]: goroutine 42 [chan send]: Oct 01 09:18:18 goBGP gobgpd[525]: github.com/osrg/gobgp/pkg/server.(watcher).loop(0xc000079cc0) Oct 01 09:18:18 goBGP gobgpd[525]: /home/runner/work/gobgp/gobgp/pkg/server/server.go:3972 +0x64 Oct 01 09:18:18 goBGP gobgpd[525]: created by github.com/osrg/gobgp/pkg/server.(BgpServer).watch.func1 Oct 01 09:18:18 goBGP gobgpd[525]: /home/runner/work/gobgp/gobgp/pkg/server/server.go:4148 +0xa9f Oct 01 09:18:19 goBGP systemd[1]: gobgpd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT Oct 01 09:18:19 goBGP systemd[1]: gobgpd.service: Failed with result 'exit-code'.

root@goBGP:/# systemctl status gobgpd

Sep 30 15:55:50 goBGP gobgpd[10541]: github.com/osrg/gobgp/pkg/server.readAll(0xeb3b40, 0xc000010008, 0x13, 0x34d464b5a3f, 0x167c000, 0xc000099c00, 0xcc, 0xcc) Sep 30 15:55:50 goBGP gobgpd[10541]: /home/runner/work/gobgp/gobgp/pkg/server/fsm.go:798 +0xa5 Sep 30 15:55:50 goBGP gobgpd[10541]: github.com/osrg/gobgp/pkg/server.(fsmHandler).recvMessageWithError(0xc0001521c0, 0xc0000d1f90, 0x0, 0x0) Sep 30 15:55:50 goBGP gobgpd[10541]: /home/runner/work/gobgp/gobgp/pkg/server/fsm.go:935 +0x7b Sep 30 15:55:50 goBGP gobgpd[10541]: github.com/osrg/gobgp/pkg/server.(fsmHandler).recvMessageloop(0xc0001521c0, 0xeab3c0, 0xc0023ce1c0, 0xc0005c6240, 0x0, 0x0) Sep 30 15:55:50 goBGP gobgpd[10541]: /home/runner/work/gobgp/gobgp/pkg/server/fsm.go:1725 +0x61 Sep 30 15:55:50 goBGP gobgpd[10541]: created by github.com/osrg/gobgp/pkg/server.(*fsmHandler).established Sep 30 15:55:50 goBGP gobgpd[10541]: /home/runner/work/gobgp/gobgp/pkg/server/fsm.go:1747 +0x1c3 Sep 30 15:55:51 goBGP systemd[1]: gobgpd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT Sep 30 15:55:51 goBGP systemd[1]: gobgpd.service: Failed with result 'exit-code'.

Perhaps test logs will be useful to eliminate a potential bug

best regards

fujita commented 4 years ago

Thanks for the report. Unfortunately, any hint in the log about the crash. Looks like some of the log is missing. Can you just run gobgpd (not via systemd) and send the log?

mszbattle commented 4 years ago

goBGP_9routesets_5trials_5iter_ebgp_ribin_convergence.pdf

mszbattle commented 4 years ago

goBGP_crash_500k_routes.txt

mszbattle commented 4 years ago

Here's full log and ixia raport. Hope you will find the cause.

fujita commented 4 years ago

Looks like out of memory.

mszbattle commented 4 years ago

Thx for information, VM has allocated 4 GB RAM now , I will increase it to 8GB and retest.

mszbattle commented 4 years ago

Unfortunatly the effect is still the same After increasing memory to 8 GB and enabled memory monitor on this VM I have spotted that goBGP took all available memory and crashed again, logs below rib_in_gobgp_service_failure_8GB_RAM.txt I will do another try with 16GB of memory

mszbattle commented 4 years ago

The same situation again, deamon took all available memory and crashed, rib_in_gobgp_service_failure_16GB_RAM.txt

fujita commented 4 years ago
Oct  3 09:28:25 goBGP ntpd[470]: routing socket reports: No buffer space available

Oct  3 09:31:38 goBGP ntpd[470]: message repeated 35 times: [ routing socket reports: No buffer space available]
Oct  3 09:32:28 goBGP gobgpd[458]: fatal error: runtime: out of memory
Oct  3 09:32:28 goBGP gobgpd[458]: runtime stack:

gobpgd and ntpd died due to out of memory. You need more memory until you don't hit the out-of-memory situation.

mszbattle commented 4 years ago

What is recommended memory level for goBGP for one neighbor bgp peer topology ? (full internet bgp rib). I tested other deamons like frr, bird and they behaved like more stable during convergance tests.

fujita commented 4 years ago

frr and bird in implemented in C. There is room to improve memory usage in GoBGP though.

On my env, gobgpd having the full route consumes about 700MB. During receiving routes, gobgpd uses more memory temporarily.

root@64cb62a663fb:~# gobgp n
Peer        AS     Up/Down State       |#Received  Accepted
192.168.0.3  3 2d 08:03:00 Establ      |   771684    771684
root@64cb62a663fb:~# go tool pprof http://localhost:6060/debug/pprof/heap
Fetching profile over HTTP from http://localhost:6060/debug/pprof/heap
Saved profile in /root/pprof/pprof.gobgpd.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz
File: gobgpd
Build ID: 5a18a68990a041e72618630be9f31c7dad6483f7
Type: inuse_space
Time: Oct 3, 2019 at 10:14am (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top   
Showing nodes accounting for 628.53MB, 90.11% of 697.53MB total
mszbattle commented 4 years ago

Is your deamon intergrated with zebra ?

fujita commented 4 years ago

No