quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.35k stars 2.55k forks source link

Quarkus Scheduler stops without trace #41240

Open nicklasweasel opened 2 weeks ago

nicklasweasel commented 2 weeks ago

Describe the bug

This is a bit fluffy but I have a Quarkus Scheduler task running as a cron job every 10 minutes. It can run for days or weeks and then just suddenly stop with no trace in the logs. What would be the way to debug it in order to see if it's some sort of resource starvation?

Expected behavior

The scheduler keeps running (or fails with exceptions)

Actual behavior

The scheduler stops without trace

How to Reproduce?

No response

Output of uname -a or ver

No response

Output of java -version

OpenJDK 64-Bit Server VM (build 21.0.1+12-29, mixed mode, sharing)

Quarkus version or git rev

3.6.3

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

quarkus-bot[bot] commented 2 weeks ago

/cc @brunobat (opentelemetry,tracing), @manovotn (scheduler), @mkouba (scheduler), @radcortez (opentelemetry,tracing)

mkouba commented 2 weeks ago

Do you use the quarkus-scheduler or the quarkus-quartz extension?

In any case, I would start with a thread dump, taken e.g. by VisualVM. For quarkus-scheduler there's one thread that is used to check all the triggers and then to execute a scheduled method: for blocking scheduled methods the default blocking executor is used, for non-blocking scheduled methods the event loop and for blocking scheduled methods annotated with @RunOnVirtualThread a new virtual thread is used.

mkouba commented 2 weeks ago

If you enable the TRACE logging for the scheduler:

quarkus.log.category."io.quarkus.scheduler".level=TRACE
quarkus.log.category."io.quarkus.scheduler".min-level=TRACE

You should see the "Check triggers" message and also a separate message for each trigger fired:

2024-06-17 09:21:05,000 TRACE [io.qua.sch.run.SimpleScheduler] (pool-8-thread-1) Check triggers at 2024-06-17T09:21:05.000058709+02:00[Europe/Prague]
2024-06-17 09:20:36,002 TRACE [io.qua.sch.run.SimpleScheduler] (pool-8-thread-1) CronTrigger [id=1_org.acme.scheduler.CounterBean#cronJobWithExpressionInConfig, cron=*/5 * * * * ?, gracePeriod=PT1S, timeZone=null] fired

However, this is not very practical for a production environment.

nicklasweasel commented 1 week ago

I use quarkus-scheduler.

And it's in production and running in an AWS Fargate task so it would be a bit impractical

mkouba commented 1 week ago

And it's in production and running in an AWS Fargate task so it would be a bit impractical

I see, in that case a thread dump might be helpful.