postmanlabs / postman-app-support

Postman is an API platform for building and using APIs. Postman simplifies each step of the API lifecycle and streamlines collaboration so you can create better APIs—faster.
https://www.postman.com
5.81k stars 839 forks source link

Postman monitors timing out when run on schedule #5529

Closed VicKetchup closed 5 years ago

VicKetchup commented 5 years ago

Describe the bug First time happened on 12/11 Some of our scheduled monitors time out during the scheduled run. In console it can clearly be seen that the run has finished, however it takes just over 5 minutes and timeout error is thrown:

3436 | 9:05:45 | NODE MBaaS - DE (Test) finished
-- | -- | --
3437 | 9:05:45 | Error: callback timed out
3438 | 9:05:45 | Error: callback timed out

Running the monitor manually will result in successful run that takes ~2 minutes.

To Reproduce Steps to reproduce the behavior:

  1. Create multiple monitors (5 in our case)
  2. Schedule them to run (9AM in our case)
  3. Sometimes the monitors will time out due to reaching time limit

Expected behavior The collection runs in newman, collection runner and even in majority of Monitor runs take around ~2 minutes, expecting all the runs to fit within the 5 minute limit.

Screenshots image

App information (please complete the following information):

godfrzero commented 5 years ago

@VicKetchup At what point is the message NODE MBaaS - DE (Test) finished logged? Are there any requests in the collection after this?

VicKetchup commented 5 years ago

@godfrzero Those are the last 4 lines of the log, all requests have finished at that point.

godfrzero commented 5 years ago

Could you shoot us an email at help@getpostman.com with the monitor ID? If you can share the collection as well, that'd speed things up too.

Satak commented 5 years ago

Can we change the 5 minute timeout somewhere? I have the same issue, getting these in the end: Error: callback timed out

VicKetchup commented 5 years ago

Having Monitors time out before they finish now :( Any progress on this issue?

godfrzero commented 5 years ago

Changing the 5 minute timeout is not on the roadmap right now. In addition, I haven't found any indication of timeouts being caused by any monitoring platform component.

@VicKetchup If each API call is expected to finish within a certain period (30 seconds max, perhaps?), can you set that as a request-level timeout config in the monitor? This will help isolate the requests which are actually causing the timeout.

VicKetchup commented 5 years ago

@godfrzero I will double check that this issue is still present, but, if I remember correctly, the time the calls took wasn't the issue (we looked at the sum of response times), if anything, this seemed like a load issue on monitor side when we kick off 5 at once. I will check this on Monday and provide any relevant details.

godfrzero commented 5 years ago

I'm fairly certain it's not a load issue for two reasons:

  1. The monitoring platform handles 10k - 20k+ concurrent runs at times. 5 concurrent runs would not be a problem. Individual runs take place on isolated VMs within separate processes, so runs don't affect each other.
  2. Every similar reported case of timeouts in the past have been traced back to an API that's misbehaving or a legitimate run timeout where the collection actually takes longer than 5 minutes to execute.

That being said, I'm not completely dismissing the possibility. @VicKetchup I'm closing this ticket for now, but if you still see this happening then I think it might be best if we got on a quick call so I can take a closer look at the problem.

VicKetchup commented 5 years ago

@godfrzero The exact issue hasn't occurred for a very long time, I went though all available data on the monitors we have and couldn't find any of the runs that have it, so hopefully it's all good 👍 If the issue comes back, I'll ensure to have all info regarding response times ready. Update: Managed to find a run, had a look where the times have went up in logs and I think I found the exact reason. We used to load our swaggers into collection variables (the full JSON file, which is quite big), looking at the logs every time it loaded the swagger it was taking 9-20 seconds. We now have a host for our swaggers and load them in by making a request, which seems to be much faster and explains why we no longer see the issue.

godfrzero commented 5 years ago

Cool, thanks for letting us know. 👍

VicKetchup commented 5 years ago

@godfrzero we have recently added a lot more monitors and now the collections are timing out again. Looking at the logs it's not the response times, so I'd like to take you up on that call offer to try to get to the bottom of it.

godfrzero commented 5 years ago

Alright, could you let me know what your availability is via email at help@getpostman.com? Please mention the GH ticket number.