Open jake9050 opened 7 years ago
@jake9050 any idea why the Arakoon was stuck in a start/fail/start loop ?
@jtorreke any idea why Arakoon acted up?
It was lagging behind too much and could no longer catch up from cluster members. Throwing out the local data and start a new copy was the solution.
@jtorreke was the root cause of not being able to catchup that the messages were too big?
After updating pocops to the latest Fargo release the arakoon services get stuck in a loop where they constantly restart. The ovs homefolder gets populated with files caled
console:.debug.TIMESTAMP.xxxxxx
that contain these kinds of messages:1502284724: main debug: 7679026 => store 1502284724: main debug: Store.incr_i old_i:Some ("7679025") -> new_i:7679026 1502284724: main debug: 7679027 => store 1502284724: main debug: Store.incr_i old_i:Some ("7679026") -> new_i:7679027 1502284724: main debug: 7679028 => store 1502284724: main debug: Store.incr_i old_i:Some ("7679027") -> new_i:7679028 1502284724: main debug: after_block 1502284724: main debug: _fold_blocks 1502284724: main info: Completed replay of 1535.tlx, took 0.502017 seconds, 1 to go 1502284724: main info: Replaying tlog file: 1536.tlog [7679030,...] (2/2) 1502284724: tlog_map debug: fold_read extension=.tlog => index':Some {filename="/mnt/ssd1/arakoon/flash-10-nsm_15/tlogs/1536.tlog";mapping=} 1502284724: main debug: U.fold 7679029 Some ("7679092") ~index:Some {filename="/mnt/ssd1/arakoon/flash-10-nsm_15/tlogs/1536.tlog";mapping=} 1502284724: main debug: maybe_fast_forward 7679029 with Some {filename="/mnt/ssd1/arakoon/flash-10-nsm_15/tlogs/1536.tlog";mapping=} 1502284724: main debug: 7679029 => store 1502284724: main debug: Store.incr_i old_i:Some ("7679028") -> new_i:7679029 1502284724: main debug: 7679030 => skip 1502284724: main debug: 7679029 => store 1502284724: tlog_map debug: filename:/mnt/ssd1/arakoon/flash-10-nsm_15/tlogs/1536.tlog(Failure "update 7679029, store @ 7679029 don't fit") 1502284724: main fatal: going down(Failure "update 7679029, store @ 7679029 don't fit") 1502284724: main fatal: after pick
This eventually fills the disk causing more trouble.
System info os: Ubuntu 16.04.3 LTS
OVS components
`ii alba 1.3.14 amd64 the ALternative BAckend ii arakoon 1.9.17 amd64 Simple consistent distributed key/value store ii openvstorage 2.8.2-1 amd64 openvStorage ii openvstorage-backend 1.8.1-1 amd64 openvStorage Backend plugin ii openvstorage-backend-core 1.8.1-1 amd64 openvStorage Backend plugin core ii openvstorage-backend-webapps 1.8.1-1 amd64 openvStorage Backend plugin Web Applications ii openvstorage-core 2.8.2-1 amd64 openvStorage core ii openvstorage-hc 1.8.1-1 amd64 openvStorage Backend plugin HyperConverged ii openvstorage-health-check 3.2.0-fargo.3-1 amd64 Open vStorage HealthCheck ii openvstorage-sdm 1.7.1-1 amd64 Open vStorage Backend ASD Manager ii openvstorage-webapps 2.8.2-1 amd64 openvStorage Web Applications
`