open-telemetry / opentelemetry-demo

This repository contains the OpenTelemetry Astronomy Shop, a microservice-based distributed system intended to illustrate the implementation of OpenTelemetry in a near real-world environment.
https://opentelemetry.io/docs/demo/
Apache License 2.0
1.88k stars 1.24k forks source link

after deploying Astronomy App, recommendation service is throwing errors #1740

Open hk-bmi opened 1 month ago

hk-bmi commented 1 month ago

Which version of the demo you are using?

Main (https://github.com/open-telemetry/opentelemetry-demo/commit/a1cfe470c6c1e412865b351e473060b66674b2fb)

Bug Report

Seeing the following error in recommendation service logs.

Which version of the demo you are using? (please provide either a specific commit hash or a specific release).

Symptom

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/openfeature/client.py", line 345, in evaluate_flag_details
    flag_evaluation = self._create_provider_evaluation(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openfeature/client.py", line 436, in _create_provider_evaluation
    resolution = get_details_callable(*args)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openfeature/contrib/provider/flagd/provider.py", line 94, in resolve_boolean_details
    return self.resolver.resolve_boolean_details(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openfeature/contrib/provider/flagd/resolvers/grpc.py", line 41, in resolve_boolean_details
    return self._resolve(key, FlagType.BOOLEAN, default_value, evaluation_context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/openfeature/contrib/provider/flagd/resolvers/grpc.py", line 123, in _resolve
    raise GeneralError(message) from e
openfeature.exception.GeneralError: received grpc status code StatusCode.UNAVAILABLE
2024-10-10 09:28:34,004 INFO [main] [recommendation_server.py:47] [trace_id=0ca70a91ed357d654e9f7de0478073ed span_id=6fcfbead800d197a resource.service.name=recommendationservice trace_sampled=True] - Receive ListRecommendations for product ids:['L9ECAV7KIM', 'OLJCESPC7Z', '0PUK6V6EV0', '1YMWWN1N4O', '9SIQT8TOJO']
2024-10-10 09:28:34,040 ERROR [openfeature] [client.py:364] [trace_id=8c7d0ec6dc542983c67e37218e4d34b3 span_id=87416c7407ee6add resource.service.name=recommendationservice trace_sampled=True] - Error ErrorCode.GENERAL while evaluating flag with key: 'recommendationServiceCacheFailure'
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/openfeature/contrib/provider/flagd/resolvers/grpc.py", line 89, in _resolve
    response = self.stub.ResolveBoolean(request, **call_args)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/opentelemetry/instrumentation/grpc/grpcext/_interceptor.py", line 69, in __call__
    return self._interceptor.intercept_unary(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/opentelemetry/instrumentation/grpc/_client.py", line 185, in intercept_unary
    return self._intercept(request, metadata, client_info, invoker)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/opentelemetry/instrumentation/grpc/_client.py", line 166, in _intercept
    raise exc
  File "/usr/local/lib/python3.12/site-packages/opentelemetry/instrumentation/grpc/_client.py", line 152, in _intercept
    result = invoker(request, metadata)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/opentelemetry/instrumentation/grpc/grpcext/_interceptor.py", line 59, in invoker
    return self._base_callable(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/grpc/_channel.py", line 1160, in __call__
    return _end_unary_response_blocking(state, call, False, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/grpc/_channel.py", line 1003, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.UNAVAILABLE
    details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:100.67.217.234:8013: Operation not permitted"
    debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2024-10-10T09:28:34.039835373+00:00", grpc_status:14, grpc_message:"failed to connect to all addresses; last error: UNKNOWN: ipv4:100.67.217.234:8013: Operation not permitted"}"

Reproduce

in kubernetes cluster, deploy astronomy app, observe the logs of recommendation service. It will show the above errors.

julianocosta89 commented 1 month ago

hello @hk-bmi 👋🏽

thanks for reporting this issue.

Are you able to validate if all services are up and running? The error seems to be related to recommendationService not being able to reach the feature flag service.

hk-bmi commented 1 month ago

Hi @julianocosta89 , thanks for your response. Yes, the services are up and running. But, this behavior is intermittent.

julianocosta89 commented 1 month ago

could you share more details about your setup? docker version, cpu, memory, OS. that will help us investigate the issue further

hk-bmi commented 1 month ago

Tested on both openshift cluster on IBM Cloud and kubernetes cluster on AWS; under both scenarios we are getting the above error.

puckpuck commented 6 days ago

@hk-bmi can you try with the latest version and describe the types of instances that the demo is deployed to? Are these equivalent to T-class style instances on AWS? Because the demo has a constant running load, it doesn't support being run on systems that are not guaranteed for their compute and network resources such as T-class instances on AWS.