Closed juanpablo-santos closed 1 year ago
(just to clarify, database shows that the task is still running, so it doesn't seem to be related to UI - as soon as we manually fix the appropiate rows, everything is fine again)
Hello @juanpablo-santos , Could you provide a sample app that exhibits this behavior? I could not reproduce this issue as you described with a sample app that contained a single job with 2 steps.
Hi @cppwfs ,
will work on a sample. Did you run the spring batch application inside a docker container? I'm feeling that the root cause is caused by https://github.com/spring-projects/spring-batch/issues/4023#issuecomment-1525701487 (our dockerfile calls entrypoint using exec syntax, so sigterm signals should be propagated, although they don't seem to end up on the shutdown hook).
thanks in advance
@juanpablo-santos I did create an image and deploy it to my kubernetes instance. I think Mahmoud brought up a good point. What is your entrypoint that you are using for your applications?
apologies, badly written - what I was trying to ask was if the app was run on an platform != to the one hosting the scdf server, I've stumbled upon some issues with this before, and wanted to discard that.
As for the entrypoint is something like
ENTRYPOINT [ "./init.sh" ]
with init.sh
being a script ending up in something like
java -cp ${CLASSPATH} ${JAVA_OPTIONS} ${LOGBACK_PARAMS} ${SOME_OTHER_PARAMS ${START_CLS} $@
Are there any exceptions in your logs when you run the app locally? Also look forward to the sample app. Thanks!
Hi,
No, locally all is running fine, the hook gets called, etc. I'll begin with the sample app most probably next Monday/Tuesday.
Thanks for your continued support and looking into this :-)
Hi @cppwfs ,
happy to say that the we've pinpointed the issue, and it doesn't have anything to do with SCDF, but with how the stop signaling works its way from kubernetes down to the java app. For reference,
ENTRYPOINT ["/app/your-app", "arg1", "etc"]
instead of ENTRYPOINT "/app/your-app arg1 etc"
sh
file to launch your java app, you should do it using exec, that is exec java -cp ...
instead of java -cp ...
exec
makes the java app become pid 1, so in order to avoid that you should use something like tini
or pid1
to launch yor application, so the entrypoint becomes something like ENTRYPOINT ["/sbin/tini", "-v", "--", "/app/entrypoint.sh", "arg1", "etc.", "$@"]
(the $@
is important, as it passes the SCDF parameters to your application)With all that in place, the SIGTERM
signal ends up arriving to the application, our shutdownHook gets executed, etc. However, if using tini, this signaling won't stop the pod from dying after the usual 30 seconds, possibly rendering the application in RUNNING
state if your graceful shutdown takes more than that time to finish; you'll have to either use pid1
instead of tini
, which allows a timeout or the new terminationGracePeriodSeconds
parameter introduced on SCDF 2.10.3. We'll be going this way, so we're waiting on the 2.10.3 version of the helm charts to be released by the bitnami team.
Nothing of the above is specific to SCDF, but it would be very useful to have a small section on the documentation referring them, although don't know where would be the best place to place it. In our case, this article was a life saver and allowed us to dig into the right direction.
Last but not least, thank you again for your continued support and for looking into this issue, I'll proceed with closing the issue.
I'm so glad ya'll found the solution and thank you for sharing!
Description: We have some tasks running on a K8s cluster through SCDF 2.10.2. When we request to stop them, the tasks stop, the associated pods are removed but they still show up as RUNNING on SCF dashboard. Our tasks are spring batch based and we've added a listener similar to the one depicted at https://stackoverflow.com/q/66110545. While locally the listener seems to perform ok, it seems ignored when running on the cluster, or the other way round, the task still shows up as running although it has been effectively stopped.
Release versions:
Custom apps: We're using normal Spring Batch based tasks. We try to gracefully shutdown them via a listener, as shown in https://stackoverflow.com/q/66110545 in order to avoid this issue, but haven't had success at it.
Steps to reproduce:
Screenshots: N/A
Additional context: N/A