src-d / borges

borges collects and stores Git repositories.
https://docs.sourced.tech/borges/
GNU General Public License v3.0
52 stars 20 forks source link

Borges producer hangs if it cannot acquire a lock #344

Closed rporres closed 6 years ago

rporres commented 6 years ago

In a k8s deployment, I specified wrongly the locking connection string and borges didn't say anything about it in the logs, it just stayed in that way without giving any further message.

rporres commented 6 years ago

I specified a non existing dns entry. Instead of

etcd:http://etcd.default.svc.cluster.local:2379

I used

etcd:http://etcd.borges.svc.cluster.local:2379

note the change in the namespace in the etcd fqdn

kuba-- commented 6 years ago

@rporres it happened because you didn't specify ?dial-timeout=... parameter. I'll fix it by setting default value. @jfontan - 3s. sounds good to you as a default dial-timeout for etcd?

jfontan commented 6 years ago

I suppose that 3 seconds should be enough. What's the default timeout?

kuba-- commented 6 years ago

This is the problem - there is no default, we have to pick up sth. But healthBalancer uses this one: minHealthRetryDuration = 3 * time.Second, so I suppose it's ok to use the same value.

jfontan commented 6 years ago

I would use that then. It can be specified in the etcd url in the helm chart I believe:

etcd:http://etcd.borges.svc.cluster.local:2379?dial-timeout=3s