sorintlab / stolon

PostgreSQL cloud native High Availability and more.
https://talk.stolon.io
Apache License 2.0
4.66k stars 447 forks source link

no keeper info available problem #900

Closed guilongyang closed 1 year ago

guilongyang commented 1 year ago

What happened: when my etcd service is slow or network jitter my sentinel will have the log bellow 2022-12-15T00:01:11.459z WARN cmd/sentinel.go:266 no keeper info available { "db": "4b0elab1", "keeper": "keeper0"} E1215 00:11:55.846649 1 leaderelection.go:367] Failed to update lock: Operation canot be fulfilled on configmans stolor-oluster-omnon-postgresg-stolon ": the object has been modified; please apply your changes to the latest version and try again

and my database will shut down all connections

What you expected to happen: i want to let it not so sensitive to the etcd service or network jitter

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

and i want to read the cache to prevent this . is this correct?

my code is :

··· func (s *KubeStore) GetKeepersInfo(ctx context.Context) (cluster.KeepersInfo, error) { keepers := cluster.KeepersInfo{}

podsClient := s.client.CoreV1().Pods(s.namespace)

// 加入ResourceVersion版本号为0,读取缓存
listOpts := metav1.ListOptions{
    **ResourceVersion: "0",**
    LabelSelector:   s.labelSelector(KeeperLabelValue).String(),
}
result, err := podsClient.List(listOpts)
if err != nil {
    return nil, fmt.Errorf("failed to get latest version of pod: %v", err)
}

pods := result.Items
for _, pod := range pods {
    var ki cluster.KeeperInfo
    if kij, ok := pod.Annotations[util.KubeStatusAnnnotation]; ok {
        err = json.Unmarshal([]byte(kij), &ki)
        if err != nil {
            return nil, err
        }
        keepers[ki.UID] = &ki
    }
}
return keepers, nil

} ···

sgotti commented 1 year ago

Stolon was made to prevent inconsistencies. See the architecture doc. Your store must be available and reliable. You can already increase the timeout in the config and you should use a dedicated etcd instead of k8s API.