Open julianocosta89 opened 4 months ago
@julianocosta89 I had a look into this. The service use withGlobalTelemetry(true)
which internally obtains an OpenTelemetry
instance from GlobalOpenTelemetry.get()
. This follows Java manual instrumentation guidelines [1]. And in the AdService, there is no manual configuration initializations and this is the root cause for this observation.
[1] - https://opentelemetry.io/docs/languages/java/instrumentation/#manual-configuration
@Kavindu-Dodan we are using the OTel Java agent, isn't that done automatically? We have another service, written in Kotlin, and for that it is working fine. https://github.com/open-telemetry/opentelemetry-demo/blob/main/src/frauddetectionservice/src/main/kotlin/frauddetectionservice/main.kt#L32-L36
I've done some tests again and it seems that it is actually creating spans, but for some reason just once. Here is the trace containing the flagd spans:
Here are the metrics from one of the flags from AdService:
As we can see, the flag is being queried every time AdService is called, but the spans from flagd are just arriving once. Maybe we have some error in the implementation of AdService.
Hey @julianocosta89, this is an optimization in the Java provider that can disabled. Basically, a value can be cached if it's "static" and the provider is able to establish a persistent connection with flagd to bust the cache when the flag configuration changes.
Here are the configuration options that are available. https://github.com/open-feature/java-sdk-contrib/tree/main/providers/flagd#configuration-options
Hey @julianocosta89, this is an optimization in the Java provider that can disabled. Basically, a value can be cached if it's "static" and the provider is able to establish a persistent connection with flagd to bust the cache when the flag configuration changes.
Here are the configuration options that are available. https://github.com/open-feature/java-sdk-contrib/tree/main/providers/flagd#configuration-options
Oh, cool TIL.
Any reason the behaviour is different for Java and Kotlin? Both are using the same dependencies and none have the cache disabled.
There are a few reasons why a value wouldn't be cashed. For example, if there's a targeting rule in the flag configuration. I'll take a look tomorrow.
Hey @julianocosta89, can you confirm that the kafkaQueueProblems
doesn't have a targeting rule defined in your environment? That would immediately make it ineligible to be cached. I didn't notice any obvious reasons this isn't being cached, but I didn't have a chance to investigate in depth.
this is what I currently have for kafkaQueueProblems
:
"kafkaQueueProblems": {
"description": "Overloads Kafka queue while simultaneously introducing a consumer side delay leading to a lag spike",
"state": "ENABLED",
"variants": {
"on": 100,
"off": 0
},
"defaultVariant": "off"
},
And this is the kotlin snippet:
fun main() {
val options = FlagdOptions.builder()
.withGlobalTelemetry(true)
.build()
val flagdProvider = FlagdProvider(options)
OpenFeatureAPI.getInstance().setProvider(flagdProvider)
val props = Properties()
props[KEY_DESERIALIZER_CLASS_CONFIG] = StringDeserializer::class.java.name
props[VALUE_DESERIALIZER_CLASS_CONFIG] = ByteArrayDeserializer::class.java.name
props[GROUP_ID_CONFIG] = groupID
val bootstrapServers = System.getenv("KAFKA_SERVICE_ADDR")
if (bootstrapServers == null) {
println("KAFKA_SERVICE_ADDR is not supplied")
exitProcess(1)
}
props[BOOTSTRAP_SERVERS_CONFIG] = bootstrapServers
val consumer = KafkaConsumer<String, ByteArray>(props).apply {
subscribe(listOf(topic))
}
var totalCount = 0L
consumer.use {
while (true) {
totalCount = consumer
.poll(ofMillis(100))
.fold(totalCount) { accumulator, record ->
val newCount = accumulator + 1
if (getFeatureFlagValue("kafkaQueueProblems") > 0) {
logger.info("FeatureFlag 'kafkaQueueProblems' is enabled, sleeping 1 second")
Thread.sleep(1000)
}
val orders = OrderResult.parseFrom(record.value())
logger.info("Consumed record with orderId: ${orders.orderId}, and updated total count to: $newCount")
newCount
}
}
}
}
/**
* Retrieves the status of a feature flag from the Feature Flag service.
*
* @param ff The name of the feature flag to retrieve.
* @return `true` if the feature flag is enabled, `false` otherwise or in case of errors.
*/
fun getFeatureFlagValue(ff: String): Int {
val client = OpenFeatureAPI.getInstance().client
// TODO: Plumb the actual session ID from the frontend via baggage?
val uuid = UUID.randomUUID()
val clientAttrs = mutableMapOf<String, Value>()
clientAttrs["session"] = Value(uuid.toString())
client.evaluationContext = ImmutableContext(clientAttrs)
val intValue = client.getIntegerValue(ff, 0)
return intValue
}
@julianocosta89 could it be that you are evaluating the same flag in quick succession, and evaluations are interleaving? The flag value can't be cached until the RPC completes, so if 2 evaluations happen quickly, they will both cause RPCs since theirs no previously resolved value in the cache (yet)..
Besides some transient connection issue, I can't see another reason. If the reason
field is coming back as STATIC
, subsequent evaluations should result in cache hits with your configuration :thinking:
EDIT:
Oh, one more reason could be that the provider hasn't fully connected its stream before which no caching happens... this is plausible since you're using setProvider
(which doesn't block awaiting a connection) instead of setProviderAndWait
which blocks until the provider is fully initialized.
Observed behavior
When updating the dependencies for the AdService in the OTel demo, I've noticed that the flagd spans are not being created in Java. We do have the flagd spans connected with RecommendationService (Python), CartService (.NET) and Frontend-proxy (Envoy).
But for AdService no. Even though, we follow the recommended approach documented here: https://flagd.dev/providers/java/#opentelemetry-tracing-rpc-only
https://github.com/open-telemetry/opentelemetry-demo/blob/main/src/adservice/src/main/java/oteldemo/AdService.java#L86-L94
Expected Behavior
Whenever AdService checks the state of the FeatureFlag, I'd expect 3 spans from flagd to be created, as it is happening with other services.
Steps to reproduce
Navigate to Jaeger UI: http://localhost:8080/jaeger/ui/
Check the received traces from AdService and Flagd.