sieve-project / sieve

Automatic Reliability Testing for Kubernetes Controllers and Operators
BSD 2-Clause "Simplified" License
327 stars 20 forks source link

Events related to services are not detected from API server side #7

Closed laphets closed 3 years ago

laphets commented 3 years ago

As mentioned in previous meeting, through batch analysis on mongodb operator, there are some cases that have matched for crucial event, but not detect side effect event on API server side (however, we observed side effect issued by the operator). And all of those failed patterns have the crucial event of service, e.g.

se-name: mongodb-cluster-cfg
se-namespace: default
se-rtype: service
se-etype: ADDED

After dumping all the event keys from here, I observed that keys related to service is like /services/specs/default/mongodb-cluster-cfg or /services/endpoints/default/mongodb-cluster-cfg, and the event resource is parsed as spec / endpoints instead of service according to logic here.

After looking into some source code of k8s, I figure the reason behind this. The basic idea is that k8s will add some special prefix to certain resources as the key stored in etcd. For endpoints, the key is then transformed into services/endpoints, for service, the key is services/specs. As for our matching, we also need to consider for those special prefix.

The current fix is to manually map the services/endpoints to endpoints, and services/specs to services. After the fix, service related side effect event can be successfully detected and sieve can then detect bug related to service ADD / DELETE.

marshtompsxd commented 3 years ago

Nice catch!

The basic idea is that k8s will add some special prefix to certain resources as the key stored in etcd. For endpoints, the key is then transformed into services/endpoints, for service, the key is services/specs. As for our matching, we also need to consider for those special prefix.

Yes, the different naming (and even different data structures) for the same resource object at different components is tricky. We might need some rules to associate such events.

tianyin commented 3 years ago

close; if my understanding is correct, we don't find new bugs but it helps match the events in the learning stage to make sure we don't miss later.