sorintlab / stolon

PostgreSQL cloud native High Availability and more.
https://talk.stolon.io
Apache License 2.0
4.63k stars 444 forks source link

Missing timeout for libkv based stores #728

Open dineshba opened 4 years ago

dineshba commented 4 years ago

We don't have timeout for libkv based stores(we are using consul) at here. Can we add it? Is there any reason for not having it?

sgotti commented 4 years ago

@dineshba libkv doesn't support context and looks unmantained. Please take a look #677

sgotti commented 4 years ago

@dineshba Have I answered your questions?

dineshba commented 4 years ago

Can we create cancelable Context and cancel it after timeout ? I think this can be done outside libkv. Lemme try and raise a PR.

sgotti commented 4 years ago

@dineshba If the underlying libkv operation doesn't support a context we don't have a way to cancel the underlying operation. You could thinkg to execute the libkv inside a goroutine but you don't have a way to cancel that goroutine since it's blocked to the libkv function and will be stale causing all kind of issues.

dineshba commented 4 years ago

I agree that we cannot close the connection(it will become a stale connection). I think we can write code in such a way that stale connection will not create problem. What do you think?

On side note: If libkv is not scalable, can we slowly migrate away from it? May be we can start with consul first.

sgotti commented 4 years ago

I think we can write code in such a way that stale connection will not create problem. What do you think?

I'm not sure how this could be done. It will leak file descriptor and perhaps it could work for reads, but writes will end with concurrency issues since a "stale" write could complete after one issued later. I'll avoid this and just do what you proposed below:

On side note: If libkv is not scalable, can we slowly migrate away from it? May be we can start with consul first.

Yes, you can find another better library or reimplement everything directly using the related client like done for etcdv3.

sgotti commented 4 years ago

A store timeout command line option has been added in #765. Obviously it'll be honored only by stores that handle it, so it won't work with libkv consul. So the solution is always the same: find another better library or reimplement everything directly using the related client like done for etcdv3.