Open pablochacin opened 1 year ago
This issue seems to be confirmed by the following test: dummy gRPC faults faults (0 error rate and 0 delay) were injected into the catalog service while HTTP requests were applied to the frontend service and gRPC request were applied to the catalog service. The results show that gRPC request have a significant delay when compared to tests that do not inject any fault.
Test with no fault injection:
grpc_req_duration..............: avg=904.77µs min=570.41µs med=849.4µs max=2.34ms p(90)=1.07ms p(95)=1.14ms
http_req_duration..............: avg=52.35ms min=10ms med=34.15ms max=380.53ms p(90)=134.02ms p(95)=186.8ms
Test with dummy fault injection
grpc_req_duration..............: avg=17.05ms min=639.76µs med=1.5ms max=101.2ms p(90)=64.73ms p(95)=100.01ms
http_req_duration..............: avg=290.56ms min=996.21µs med=319.16ms max=967.33ms p(90)=567.5ms p(95)=666.56ms
This can be reproduced following these steps:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/microservices-demo/main/release/kubernetes-manifests.yaml
LoadBalancer
to expose it in the cluster kubectl patch svc productcatalogservice -p '{"spec": {"type": "LoadBalancer"}}'
FRONTEND_SVC=$(kubectl get svc frontend-external --output jsonpath='{.status.loadBalancer.ingress[0].ip}')
CATALOG_SVC=$(kubectl get svc productcatalogservice --output jsonpath='{.status.loadBalancer.ingress[0].ip}')
INJECT_FAULTS
for toggling the fault injection on and off:~/go/bin/k6 --env FRONTEND_SVC=$FRONTEND_SVC --env CATALOG_SVC=$CATALOG_SVC:3550 --env INJECT_FAULTS=1 run test-catalog.js
Note: the test requires the protobuf definition for the catalog service
Using the agent instrumentation introduced in #166 it seems the bulk of the latency occurs in the goroutines that handle the connection to/from upstream gRPC service and it is introduced by garbage collection:
Increasing the memory of the target pods seems to mitigate the GC issue (see image below) but there's still a significant variation in the latency
Without fault injection
grpc_req_duration..............: avg=871.47µs min=457.38µs med=660.8µs max=25.82ms p(90)=846.88µs p(95)=1ms
With dummy fault injection
grpc_req_duration..............: avg=5.51ms min=474.6µs med=1.16ms max=101.57ms p(90)=15.65ms p(95)=16.46ms
When testing fault injection in the Online Boutique application the requests seems to have a significant delay (around
500ms
) even when the faults did not specify any delay (only an error code).The SUT of the test is a frontend service's HTTP API and the fault injection target was the catalog service, using gRPC fault injection. Therefore, it is not clear whether the delay happened actually in the gRPC requests but repeating the tests several times produced the same behavior.
It would be necessary to check the behavior with a test that access the catalog service directly.