spring data flow server k8s lifecycle properties.

@eskuai commented on Tue Sep 03 2019

Problem description:
Allow deployment pods using lifecycle properties.

We are getting some trouble hangling request from http client side, specially using bosh and grpc protocols.

"Client-side" requesters to our http source app are sending reports about losting processing request and they get many requests in "undefined" status or losted after our "ordered" shutdown, where they are connected to our http source app.

We have observed that there is not a "secure" shutdown from http-source app (using undertow server inside), and we are looking into undertow forums and we cannot get any report about lost request in a shutdown process .. all infos say that undertow shutdown "wait" to finish until processing all requests and not allow new connection ... but we check that it is not always true ...

We think that some request are still processed into k8s workflow and they are losted where they are delivered to our "undertow server" into the pod into a shutdown process.

We think that we got a problem preventing broken connections during pod shutdown

Solution description: Looking for info, google, we found,

https://freecontent.manning.com/handling-client-requests-properly-with-kubernetes/

Then, TL DR: We need to exec a preStop process to "wait" the finish for async process into k8s stopping process.

We need to deploy pods applying lifecycle k8s exec

 lifecycle:                   
      preStop:                   
        exec:                    
          command:               
          - sh
          - -c
          - "sleep 15"

@chrisjs commented on Wed Sep 11 2019

not knowing your environment or having a way to reproduce what your seeing, can't say if this is the "proper" fix or not. lifecycle events are a valid use case, though not currently available in deployer. if implemented, it would be something the end user sets through typical properties either per app or globally, we wouldn't apply it everywhere by default

@eskuai commented on Wed Sep 11 2019

Hi @chrisjs

Could be it possible that deployer applies the lifecycle options into pods to run test and get more detailed info?

We are using a k8s 1.14.3 version environment ( we got same scene at 1.15.3), with multiple masters and worker nodes, java 8, docker 1.13.1, scdf 2.2.0, uaa 4.30, mariadb cluster with master and two nodes, zookeeper ha,3.4.10 kafka 2.2 multiple nodes ... Our apps are not using tomcat inside, we are using undertow 2.0.22

Simplify process idea: We make an stream for a multiples clients, they send massive http request (very high process ratio)

--> source reads    (kafka persist)
--> processor filter  (kafka persist)
--> sink resend another system

Use case: 1) Stream is "stopped" or "undeployed" . How -secure- is this process??? The system for an external viewer is "transactional" ...

All operations receive into source can and have to be processed into this running stream or next start.
No request must be lost.  No order required.
The source app must not  accept any request on shutdown and finish processing for running
Http client request doesn't know that system is in shutdown and it could not connect to app.

2) k8s restarts a app from these stream (we dont know wich one or why) The same requirements ...

We are reading for undertow shutdown process and it seems that undertow server doesnt accept new requests in shutdown process (it is still in doubt status for us) ... but, at this moment, we assume that it does the right things.

On the other hand, k8s process for restart and shutdown pods-apps needs a finish time for the lifecycle ... and it is a async process for the stream pods...

Simluating applying 100 http clients, sendings 10k and more request, (ab tool can simulate this, we can not networking issues -10gbe- network card environmet) ... and catching logs for source app pod from undertwo and k8s audit, sometimes, at random process, we observed that some request are not processed and other request are terminated with response code is 404... from where, undertow !? and it seems that it is happening into a "stop" stream process.

Could it be that k8s shutdown process can broke connections during pod shutdown and source app undertown server could not be notified?

If i could deploy a lifecycle writing logging info about pod shutdown and undertow shutdown... and check both of them...

@sabbyanandan commented on Wed Sep 11 2019

Hi, @eskuai. There are a lot of things going on here. Let me attempt to comment on the areas that are relevant to SCDF. Before I do that, though, let me try to understand your use-case first.

1) You have a stream made of source | processor | sink 2) You would want to deploy the same stream for multiple customers dynamically 3) You're wondering if (2) can be designed to stop and start for each customer instead of deploying a brand new stream for each of the customers

Did I get this right? If not, please clarify. I'd appreciate it if you can stick to just what is problematic in this use-case.

Now, onto your comment ..

k8s restarts a app from these stream (we dont know wich one or why)

I'm confused about how this is related to your use-case. I don't see any connection.

We are reading for undertow shutdown process and it seems that undertow server doesnt accept new requests in shutdown process (it is still in doubt status for us)

You're referring to Undertow related lifecycle attributes, and this has nothing to do with SCDF. If you use Tomcat or Undertow or Netty or something else, it is in your control anyway. We don't interact at that runtime level. SCDF simply deploys the stream/task apps for you, and that's all.

Simluating applying 100 http clients, sendings 10k and more request, (ab tool can simulate this, we can not networking issues -10gbe- network card environmet) ... and catching logs for source app pod from undertwo and k8s audit, sometimes, at random process, we observed that some request are not processed and other request are terminated with response code is 404... from where, undertow !?

It sounds like you're trying to do some load testing. It is unclear as to what "process" are you referring to here — also, unclear what is failing with a 404.

If i could deploy a lifecycle writing logging info about pod shutdown and undertow shutdown... and check both of them...

Sorry, again, interacting directly with Undertow events is not directly in our control at SCDF. You may want to research about it from Spring Boot perspective.

Lastly, if you're going to report bugs or feature requests, without a clear description of what you're trying to accomplish, it is hard for us to assist. If you could describe a concise use-case, stick to just the problem in hand, and please if you could elaborate everything around the problem, we can review.

@eskuai commented on Mon Sep 16 2019

Hi @sabbyanandan

As a user, if I undeploy a stream with high load running on k8s, using http source, the shutdown process seems not to be "transactional" ... some request are lost ...

We dont know if the problem it is on source app shutdown, (insde undertow) or the k8s shutdown pods process or both. The situation is that some requests are lost and it cannot be happen.

Reading from https://freecontent.manning.com/handling-client-requests-properly-with-kubernetes/ It may be the shutdown k8s process

imagen

And more execatly, about kube-proxy actions...

imagen

But, as a user, can i request about deployer apply lifecycle actions in pods deployments? Then, I could check and be sure that pods running with

imagen

realize a completed, right ( as we need) , "transactional" shutdown ... not losting requests..

I think that this scene is applied for all streams running on k8s...

@chrisjs commented on Mon Sep 16 2019

@eskuai as mentioned previously, the ability to configure lifecycle events on deployed applications is not currently available as a deployer option. if this is something you need right away, we welcome PR's. otherwise we will keep this issue open for possible future prioritization

spring-cloud / spring-cloud-deployer-kubernetes

spring data flow server k8s lifecycle properties. #324