openvstorage / framework

The Framework is a set of components and tools which brings the user an interface (GUI / API) to setup, extend and manage an Open vStorage platform.
Other
27 stars 23 forks source link

alba[5376]: (Unix.Unix_error "Cannot assign requested address" bind "") #2322

Closed yongshengma closed 5 years ago

yongshengma commented 5 years ago

Hello ,

I have 3 nodes cluster. All 3 got the same issue: (Unix.Unix_error "Cannot assign requested address" bind ""). Here's info from one node.

Running processes
=================

alba-maintenance_storage-KkcVdVmQcy0ut9jM  active  1480
asd-manager                                active  1416
asd-watcher                                active  1413

Non-running processes
=====================

alba-asd-yKVPpBz0g20BihAGdTJuhg84RfEnX2Ie  activating

Logs of alba-asd-yKVPpBz0g20BihAGdTJuhg84RfEnX2Ie

 Jul 24 16:37:19 compute1 sh[5371]: alba: main: found env var ALBA_LWT_ENGINE=libev
Jul 24 16:37:19 compute1 systemd[1]: Started ALBA ASD.
Jul 24 16:37:19 compute1 alba[5376]: alba: main: found env var ALBA_LWT_ENGINE=libev
Jul 24 16:37:19 compute1 alba[5376]: 2019-07-24 16:37:19 150648 +0800 - compute1 - 5376/0000 - alba/asd - 0 - info - Found the following config: {;    "asd_id": "yKVPpBz0g20BihAGdTJuhg84RfEnX2Ie", ;    "log_level": "info", ;    "ips": [;        "192.168.6.120";    ], ;    "multicast": null, ;    "transport": "tcp", ;    "home": "/mnt/alba-asd/zrOyeYOcJsj4tHg5/yKVPpBz0g20BihAGdTJuhg84RfEnX2Ie", ;    "capacity": 107318611968, ;    "rocksdb_block_cache_size": 26200833, ;    "port": 8600, ;    "node_id": "Vk3Xuot3tY5A7nLi";}
Jul 24 16:37:19 compute1 alba[5376]: 2019-07-24 16:37:19 153144 +0800 - compute1 - 5376/0000 - alba/asd - 1 - info - Interpreted the config as: { Asd_config.Config.ips = ["192.168.6.120"]; port = (Some 8600);;  transport = "tcp"; rora_ips = None; rora_port = None;;  rora_transport = None; rora_num_cores = None; rora_queue_depth = None;;  node_id = "Vk3Xuot3tY5A7nLi";;  home = "/mnt/alba-asd/zrOyeYOcJsj4tHg5/yKVPpBz0g20BihAGdTJuhg84RfEnX2Ie";;  log_level = "info"; asd_id = (Some "yKVPpBz0g20BihAGdTJuhg84RfEnX2Ie");;  __sync_dont_use = true; limit = 99L; capacity = (Some 107318611968L);;  buffer_size = None; multicast = None; tls = None;;  tcp_keepalive =;  { Tcp_keepalive.enable_tcp_keepalive = true; tcp_keepalive_time = 20;;    tcp_keepalive_intvl = 20; tcp_keepalive_probes = 3 };;  __warranty_void__write_blobs = true; use_fadvise = true;;  use_fallocate = true; rocksdb_block_cache_size = (Some 26200833);;  lwt_unix_pool_size = 100 }
Jul 24 16:37:19 compute1 alba[5376]: 2019-07-24 16:37:19 153181 +0800 - compute1 - 5376/0000 - alba/asd - 2 - info - Lwt_unix.pool_size : 100
Jul 24 16:37:19 compute1 alba[5376]: 2019-07-24 16:37:19 153224 +0800 - compute1 - 5376/0000 - alba/asd - 3 - info - asd_server version:heads/master-0-ge43faca-dirty
Jul 24 16:37:19 compute1 alba[5376]: 2019-07-24 16:37:19 200196 +0800 - compute1 - 5376/0000 - alba/asd - 4 - info - Closing listening socket on port 8600
Jul 24 16:37:19 compute1 alba[5376]: alba: internal error, uncaught exception:
Jul 24 16:37:19 compute1 alba[5376]: (Unix.Unix_error "Cannot assign requested address" bind "")
Jul 24 16:37:19 compute1 alba[5376]: 2019-07-24 16:37:19 200332 +0800 - compute1 - 5376/0000 - alba/asd - 5 - info - server for ADDR_INET(192.168.6.120,8600) going down: (Unix.Unix_error "Cannot assign requested address" bind "")
Jul 24 16:37:19 compute1 systemd[1]: alba-asd-yKVPpBz0g20BihAGdTJuhg84RfEnX2Ie.service: main process exited, code=exited, status=1/FAILURE
Jul 24 16:37:19 compute1 systemd[1]: Unit alba-asd-yKVPpBz0g20BihAGdTJuhg84RfEnX2Ie.service entered failed state.
Jul 24 16:37:19 compute1 systemd[1]: alba-asd-yKVPpBz0g20BihAGdTJuhg84RfEnX2Ie.service failed.

Logs of alba-maintenance_storage

Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 004149 +0800 - compute1 - 1480/0000 - alba/maintenance - 9720 - info - Connecting to ADDR_INET(192.168.6.122,8600)
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 004199 +0800 - compute1 - 1480/0000 - alba/maintenance - 9721 - info - Connecting to ADDR_INET(192.168.6.120,8600)
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 004240 +0800 - compute1 - 1480/0000 - alba/maintenance - 9722 - info - Connecting to ADDR_INET(192.168.6.121,8600)
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 004268 +0800 - compute1 - 1480/0000 - alba/maintenance - 9723 - info - ignoring: Alba_client_errors.Error.Exn(6)
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 007679 +0800 - compute1 - 1480/0000 - alba/maintenance - 9724 - info - connect_with failed: 192.168.6.120 8600 None Net_fd.TCP (fd:27): Networking2.ConnectTimeout
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 007741 +0800 - compute1 - 1480/0000 - alba/maintenance - 9725 - info - connect_with failed: 192.168.6.122 8600 None Net_fd.TCP (fd:31): Networking2.ConnectTimeout
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 007816 +0800 - compute1 - 1480/0000 - alba/maintenance - 9726 - info - yKVPpBz0g20BihAGdTJuhg84RfEnX2Ie "Networking2.ConnectTimeout" was unforeseen
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 007855 +0800 - compute1 - 1480/0000 - alba/maintenance - 9727 - info - yKVPpBz0g20BihAGdTJuhg84RfEnX2Ie "Networking2.ConnectTimeout": should_invalidate:true should_retry:false
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 008152 +0800 - compute1 - 1480/0000 - alba/maintenance - 9728 - info - b2lt3fAUVxhuLZTKlWUCY4lCD0JXyQye "Networking2.ConnectTimeout" was unforeseen
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 008182 +0800 - compute1 - 1480/0000 - alba/maintenance - 9729 - info - b2lt3fAUVxhuLZTKlWUCY4lCD0JXyQye "Networking2.ConnectTimeout": should_invalidate:true should_retry:false
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 012381 +0800 - compute1 - 1480/0000 - alba/maintenance - 9730 - info - connect_with failed: 192.168.6.121 8600 None Net_fd.TCP (fd:38): (Unix.Unix_error "Connection refused" connect "")
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 012457 +0800 - compute1 - 1480/0000 - alba/maintenance - 9731 - info - connect_with failed: 192.168.6.121 8600 None Net_fd.TCP (fd:23): (Unix.Unix_error "Connection refused" connect "")
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 012694 +0800 - compute1 - 1480/0000 - alba/maintenance - 9732 - info - GgFVjTT09hS7OxsDdCKt0zzNfg30VzSO "(Unix.Unix_error \"Connection refused\" connect \"\")": should_invalidate:false should_retry:false
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 012757 +0800 - compute1 - 1480/0000 - alba/maintenance - 9733 - info - GgFVjTT09hS7OxsDdCKt0zzNfg30VzSO "(Unix.Unix_error \"Connection refused\" connect \"\")": should_invalidate:false should_retry:false
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 012818 +0800 - compute1 - 1480/0000 - alba/maintenance - 9734 - info - connect_with failed: 192.168.6.121 8600 None Net_fd.TCP (fd:59): (Unix.Unix_error "Connection refused" connect "")
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 012873 +0800 - compute1 - 1480/0000 - alba/maintenance - 9735 - info - connect_with failed: 192.168.6.121 8600 None Net_fd.TCP (fd:56): (Unix.Unix_error "Connection refused" connect "")
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 012937 +0800 - compute1 - 1480/0000 - alba/maintenance - 9736 - info - GgFVjTT09hS7OxsDdCKt0zzNfg30VzSO "(Unix.Unix_error \"Connection refused\" connect \"\")": should_invalidate:false should_retry:false
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 013000 +0800 - compute1 - 1480/0000 - alba/maintenance - 9737 - info - GgFVjTT09hS7OxsDdCKt0zzNfg30VzSO "(Unix.Unix_error \"Connection refused\" connect \"\")": should_invalidate:false should_retry:false
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 016237 +0800 - compute1 - 1480/0000 - alba/maintenance - 9738 - warning - could not receive enough fragments for namespace 52, object "tlog_5ed1e36f-a981-46a9-8a87-e63e9324b5bd" ("\154\188\024S\031 \191\181r\249\208\210\208\228\162.\1720V\183\023R\192[\242:\139Oyx\173\127") chunk 0; got 0 while 2 needed

The alba-asd service cannot activate up, but everything else looks working.

Best regards, Yongsheng

JeffreyDevloo commented 5 years ago

Hi Yongsheng

Is any other process taking in the port 8600? You can check using 'netstat -tulpn | grep 8600'. Looks like the ALBA process can not bind the socket because of it.

Best regards

yongshengma commented 5 years ago

Hi @JeffreyDevloo

I'm sorry. I just find the reason. The network for ASD was down.

Thanks for reply! Yongsheng