projectnessie / nessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics
https://projectnessie.org
Apache License 2.0
1.04k stars 130 forks source link

Health check / CDI scope mismatch #9852

Open venkata2395 opened 3 weeks ago

venkata2395 commented 3 weeks ago

Issue description

We noticed below issue when using Nessie snapshot version - 0.99.1. This appears only during the nessie startup log but works fine with the read/write operations later when pod is ready. Please provide insights on how to overcome this issue.

2024-10-30 14:12:22,387 INFO  [org.pro.qua.pro.sto.PersistProvider] (main) Using objects cache with 13694 MB, enabling distributed cache invalidations.
2024-10-30 14:12:22,571 INFO  [com.dat.oss.dri.int.cor.DefaultMavenCoordinates] (main) DataStax Java driver for Apache Cassandra(R) (com.datastax.oss:java-driver-core) version 4.15.0
2024-10-30 14:12:23,010 INFO  [com.dat.oss.dri.int.cor.tim.Clock] (vert.x-eventloop-thread-1) Using native clock for microsecond precision
2024-10-30 14:12:25,153 INFO  [org.pro.qua.pro.sto.PersistProvider] (main) Creating/opening version store CASSANDRA2 ...
2024-10-30 14:12:25,156 INFO  [org.pro.qua.pro.sto.PersistProvider] (main) Using CASSANDRA2 version store (keyspace: xxxx DDL timeout: PT5S DML timeout: PT3S)
2024-10-30 14:12:25,157 WARN  [org.pro.ser.con.ConfigChecks] (main) Both authentication (AuthN) and authorization (AuthZ) are disabled, all requests to Nessie will be permitted. This means: everybody with access to Nessie can read, write and change everything. Recommended action: Enable AuthN & AuthZ, see https://projectnessie.org/nessie-latest/configuration/
2024-10-30 14:12:25,167 INFO  [org.pro.ser.cat.sec.SecretsProducers] (main) No external secrets manager has been configured, secrets are retrieved only from the Quarkus configuration.
2024-10-30 14:12:25,477 INFO  [org.pro.nes.net.AddressResolver] (main) Using nameserver kube-dns.kube-system.svc.cluster.local with search list [nessie.svc.cluster.local, svc.cluster.local, cluster.local, reddog.microsoft.com]
2024-10-30 14:12:25,523 INFO  [org.pro.ser.dis.CacheInvalidationSender] (main) Sending remote cache invalidations to service name(s) [nessie-mgmt]
2024-10-30 14:12:25,619 INFO  [io.quarkus] (main) Nessie 0.99.1-SNAPSHOT on JVM (powered by Quarkus 3.15.1) started in 7.642s. Listening on: http://0.0.0.0:19120. Management interface listening on http://0.0.0.0:9000.
2024-10-30 14:12:25,620 INFO  [io.quarkus] (main) Profile prod activated.
2024-10-30 14:12:25,620 INFO  [io.quarkus] (main) Installed features: [agroal, amazon-dynamodb, amazon-secretsmanager, azure-keyvault-secret, cassandra-client, cdi, google-cloud-bigtable, google-cloud-secret-manager, hibernate-validator, jdbc-h2, jdbc-mariadb, jdbc-postgresql, logging-sentry, micrometer, mongodb-client, narayana-jta, oidc, opentelemetry, reactive-routes, rest, rest-jackson, security, security-properties-file, smallrye-context-propagation, smallrye-health, smallrye-openapi, vault, vertx]
2024-10-30 14:12:27,674 ERROR [io.sma.health] (executor-thread-3) SRHCK01000: Error processing Health Checks: jakarta.enterprise.context.ContextNotActiveException: RequestScoped context was not active when trying to obtain a bean instance for a client proxy of PRODUCER_METHOD bean [class=org.projectnessie.server.catalog.ConfigProducers, id=xxxxxxxxxxxxxxx]
- you can activate the request context for a specific method using the @ActivateRequestContext interceptor binding
    at io.quarkus.arc.impl.ClientProxies.notActive(ClientProxies.java:70)
    at io.quarkus.arc.impl.ClientProxies.getSingleContextDelegate(ClientProxies.java:30)
    at org.projectnessie.catalog.service.config.ConfigProducers_ProducerMethod_lakehouseConfig_iH0TIfcbanSdrPA9ouxuYH35T1A_ClientProxy.arc$delegate(Unknown Source)
    at org.projectnessie.catalog.service.config.ConfigProducers_ProducerMethod_lakehouseConfig_iH0TIfcbanSdrPA9ouxuYH35T1A_ClientProxy.catalog(Unknown Source)

Health check log provided below:

2024-10-30 14:12:27,683 INFO  [io.sma.health] (vert.x-eventloop-thread-7) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Database connections health check","status":"UP"},{"name":"MongoDB connection health check","status":"UP"},{"name":"org.projectnessie.server.catalog.ObjectStoresHealthCheck_ClientProxy","status":"DOWN","data":{"exceptionClass":"jakarta.enterprise.context.ContextNotActiveException","exceptionMessage":"RequestScoped context was not active when trying to obtain a bean instance for a client proxy of PRODUCER_METHOD bean [class=org.projectnessie.server.catalog.ConfigProducers, id=xxxxxx]\n\t- you can activate the request context for a specific method using the @ActivateRequestContext interceptor binding","rootCause":"RequestScoped context was not active when trying to obtain a bean instance for a client proxy of PRODUCER_METHOD bean [class=org.projectnessie.server.catalog.ConfigProducers, id=xxxx]\n\t- you can activate the request context for a specific method using the @ActivateRequestContext interceptor binding"}}]}
snazy commented 3 weeks ago

@venkata2395 I fear we need a full reproducer for this one. We cannot reproduce it.

adutra commented 1 week ago

Another occurrence, but this time when calling ObjectIO.ping():

2024-11-13 10:04:31,661 ERROR [org.pro.ser.cat.ObjectStoresHealthCheck] (executor-thread-1) Failed to ping warehouse 'nessie-playground', error ID f814a805-3e4d-4101-8e9f-b339d684d7a2: jakarta.enterprise.context.ContextNotActiveException: RequestScoped context was not active when trying to obtain a bean instance for a client proxy of PRODUCER_METHOD bean [class=org.projectnessie.server.catalog.CatalogProducers, id=IoYBgW3_kRGTU-USRbZWANKDXPA]
    - you can activate the request context for a specific method using the @ActivateRequestContext interceptor binding
    at io.quarkus.arc.impl.ClientProxies.notActive(ClientProxies.java:70)
    at io.quarkus.arc.impl.ClientProxies.getSingleContextDelegate(ClientProxies.java:30)
    at org.projectnessie.catalog.files.api.CatalogProducers_ProducerMethod_objectIO_Ed95GkFIhtTZERiabosXvHelB7Q_ClientProxy.arc$delegate(Unknown Source)
    at org.projectnessie.catalog.files.api.CatalogProducers_ProducerMethod_objectIO_Ed95GkFIhtTZERiabosXvHelB7Q_ClientProxy.ping(Unknown Source)
    at org.projectnessie.server.catalog.ObjectStoresHealthCheck.call(ObjectStoresHealthCheck.java:61)

This is strange because it means that the request scope was active when calling LakehouseConfig.catalog() but not when calling ObjectIO.ping().

adutra commented 1 week ago

I wonder if ObjectStoresHealthCheck shouldn't be annotated with @RequestScoped instead of @ApplicationScoped.

dimas-b commented 1 week ago

Currently, ObjectStoresHealthCheck is @ApplicationScoped, but its config object are proxies that implement the @RequestScoped logic. When the proxies are accessed within a request context they return correct data.

That is how it normally works. So, my impression is that it's a Quarkus bug rather than application code bug.

That said, changing the top-level obj to @RequestScope might help as a "workaround", but I wonder whether it might be preferable to upgrade Quarkus first.