ngulam-ai / Sherlock

0 stars 0 forks source link

App Engine - Response Latency picks several times per day #1

Closed ngulamai closed 6 years ago

ngulamai commented 6 years ago

We noticed that response latency increases up to 12/15 seconds, multiple times per day

https://www.dropbox.com/s/7h2yn07sa4elvyh/Screenshot%202017-12-09%2015.26.52.png?dl=0

akolchin-MM commented 6 years ago

I have added <min-idle-instances> 1 </ min-idle-instances> to the appengine-web.xml

According to this article it should reduce latency spikes down to very low values. But this will increase the cost AppEngine component.

Need to review result in a day or two.

https://cloud.google.com/appengine/docs/standard/java/config/appref

ngulamai commented 6 years ago

Hi It seems that something changed a few hours ago, because now latency is at 10 seconds [image: Inline image 1]

On Tue, Dec 12, 2017 at 4:43 PM, akolchin-MM notifications@github.com wrote:

I have added 1 </ min-idle-instances> to the appengine-web.xml

According to this article https://medium.com/google-cloud/app-engine-resident-instances-and-the-startup-time-problem-8c6587040a80 it should reduce latency spikes down to very low values. But this will increase the cost AppEngine component.

Need to review result in a day or two.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ngulam-ai/Sherlock/issues/1#issuecomment-351090129, or mute the thread https://github.com/notifications/unsubscribe-auth/AfvWTZgglyALQ59zzsUFe4gpdON44dvBks5s_p8LgaJpZM4Q8HAx .

akolchin-MM commented 6 years ago

I continue monitoring and experimenting with this issue but still not ready to say something certain.

It is interesting to understand and resolve it but, do you think that it is important?

Very few requests really experiencing such latency even at the moment of it's happening - it is 95th Percentile. At the same time, 50th Percentile always less than 500ms.

Also, even 15s latency should not cause real harm because requests to this URL asynchronous and should not pause their source.

But for me is important to understand your opinion on importance and urgencies of this issue.

ngulamai commented 6 years ago

I agree with you that we can leave this for a later stage, as it has no meaningful impact other than delaying a few additional seconds the hit reaching GBQ It may turn critical in a few weeks when the data scientists start deploying the machine learning workflows, though

On Fri, Dec 15, 2017 at 2:20 PM, akolchin-MM notifications@github.com wrote:

I continue monitoring and experimenting with this issue but still not ready to say something certain.

It is interesting to understand and resolve it but, do you think that it is important?

Very few requests really experiencing such latency even at the moment of it's happening - it is 95th Percentile. At the same time, 50th Percentile always less than 500ms.

Also, even 15s latency should not cause real harm because requests to this URL asynchronous and should not pause their source.

But for me is important to understand your opinion on importance and urgencies of this issue.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ngulam-ai/Sherlock/issues/1#issuecomment-352004135, or mute the thread https://github.com/notifications/unsubscribe-auth/AfvWTSTqoQNLFgY4rLg03PryzOuepg_9ks5tAnI6gaJpZM4Q8HAx .

akolchin-MM commented 6 years ago

As I can see this issue was already fixed early by me.