meraki / dashboard-api-python

Official Dashboard API library (SDK) for Python
MIT License
293 stars 154 forks source link

aiomeraki.networks.getNetworkEvents hangs #200

Closed jc516229 closed 1 year ago

jc516229 commented 1 year ago

If the total_pages="all" parameter is given, the call hangs if there are no matching events on the network.

If the parameter is not given, the call returns with no events.

This will hang if there are no events on the target network: aiomeraki.networks.getNetworkEvents(net['id'], productType="appliance", includedEventTypes = ["nbar_block", "cf_block", "sf_url_block", "sf_binary_block", "ids_start", "ids_update", "ids_error"], perPage=1000, total_pages="all")

This will return ok: aiomeraki.networks.getNetworkEvents(net['id'], productType="appliance", includedEventTypes = ["nbar_block", "cf_block", "sf_url_block", "sf_binary_block", "ids_start", "ids_update", "ids_error"], perPage=1000)

Maybe it's going to be common to other calls that use paging.

TKIPisalegacycipher commented 1 year ago

@coreGreenberet are you able to reproduce this as well? I suspect this is aio-specific.

jc516229 commented 1 year ago

@coreGreenberet are you able to reproduce this as well? I suspect this is aio-specific.

Yes, 100% reproducible.

We manage several Meraki orgs, it only happens on one, I was testing some reporting scripts, on this org the script never completed.

Making the call via the Meraki developer page or curl works. so I started playing with the parameters to the library call, and eventually found that if I left off the total_pages parameter it worked correctly. Changing perPage from minimum to maximum didn't make any difference.

TKIPisalegacycipher commented 1 year ago

Thanks for the additional context, @jc516229. I understand you can repro and am hoping @coreGreenberet can weigh in as well.

jc516229 commented 1 year ago

If I make the call on a network with no matching events via the developer page, the return data is...

{ "message": "No matching events found between Dec 27 19:13 and Jan 27 19:13.", "pageStartAt": "2022-12-27T18:13:25.000000Z", "pageEndAt": "2023-01-27T18:13:25.122836Z", "events": [] }

Presumably it is this combined with the total_pages="all" that causes the problem.

Whereas with some events, it is like this...

{ "message": null, "pageStartAt": "2023-01-03T01:18:53.486227Z", "pageEndAt": "2023-01-27T18:22:39.464783Z", "events": [ { "occurredAt": "2023-01-27T12:50:44.537164Z", "networkId": "xxxxxxx", ... "eventData": { "n": "680", "type": "bad_gateway_mode" } } ] }

coreGreenberet commented 1 year ago

I can reproduce it too, but I think it is an issue on the cloud site. The API is returning a "links" header with a valid url. So since you are requesting "all" pages the python api will download it. (Even though it is empty). The new empty page is also having a links header so this goes on and on.

During my tests, after many attempts (109) the cloud will answer without a link object and the script will finish.

@TKIPisalegacycipher could you check this in the backend, why we are receiving links header for this, when there isn't any result?

as a workaround we could break the loop as soon as we are not receiving any events anymore, but I'm not sure if this will break anything

TKIPisalegacycipher commented 1 year ago

Thanks @coreGreenberet and @jc516229. @jc516229 would you please open a case with Meraki support to report this issue? Specifically, the statement is to the effect of:

In cases where the the getNetworkEvents API endpoint returns an empty list, it also provides a links header to additional pages, which is unexpected because there are no other pages of data to return. The expectation is that a links header is not returned (or is empty) when there are no additional pages of data.

Since this seems to be a server-side issue and not a library problem, I'll close this case. However, @jc516229 and @coreGreenberet, please feel free to submit a PR changing the behavior of the library to not follow links when the response is empty if you are interested in doing so.

jc516229 commented 1 year ago

Ok, I opened a case with support, case number is 09196795

Thanks

jc516229 commented 1 year ago

Hi,

I did some more digging, if I put some print statements in the aio rest_session.py then the first return from the API doesn't have a 'next' link in it.

It calls _get_pages_legacy the first time

This is the value of 'links' returned around line 404 of https://github.com/meraki/dashboard-api-python/blob/master/meraki/aio/rest_session.py

The 'direction' variable is 'next' at this point, print(links, file=sys.stderr) gives me...

<MultiDictProxy('first': <MultiDictProxy('rel': 'first', 'url': URL('https://n383.meraki.com/api/v1/organizations/649644246248194165/networks?perPage=1000&startingAfter=L_0'))>, 'last': <MultiDictProxy('rel': 'last', 'url': URL('https://n383.meraki.com/api/v1/organizations/649644246248194165/networks?endingBefore=N_a&perPage=1000'))>)>

and breaks out of the line 406 while statement.

With no data, and no next link, shouldn't it stop after this?

Instead it calls _get_pages_legacy again, with links returned...

<MultiDictProxy('first': <MultiDictProxy('rel': 'first', 'url': URL('https://n383.meraki.com/api/v1/organizations/649644246248194165/devices?perPage=1000&productTypes%5B%5D=appliance&startingAfter=0000-0000-0000'))>, 'last': <MultiDictProxy('rel': 'last', 'url': URL('https://n383.meraki.com/api/v1/organizations/649644246248194165/devices?endingBefore=ZZZZ-ZZZZ-ZZZZ&perPage=1000&productTypes%5B%5D=appliance'))>)>

again it breaks out of the while loop.

But it _get_pages_legacy is called again, does another get...

<MultiDictProxy('prev': <MultiDictProxy('rel': 'prev', 'url': URL('https://n383.meraki.com/api/v1/networks/L_681169443639799490/events?endingBefore=2023-01-02T13:40:14.000000Z&includedEventTypes%5B%5D=nbar_block&includedEventTypes%5B%5D=cf_block&includedEventTypes%5B%5D=sf_url_block&includedEventTypes%5B%5D=sf_binary_block&includedEventTypes%5B%5D=ids_start&includedEventTypes%5B%5D=ids_update&includedEventTypes%5B%5D=ids_error&perPage=1000&productType=appliance'))>, 'next': <MultiDictProxy('rel': 'next', 'url': URL('https://n383.meraki.com/api/v1/networks/L_681169443639799490/events?includedEventTypes%5B%5D=nbar_block&includedEventTypes%5B%5D=cf_block&includedEventTypes%5B%5D=sf_url_block&includedEventTypes%5B%5D=sf_binary_block&includedEventTypes%5B%5D=ids_start&includedEventTypes%5B%5D=ids_update&includedEventTypes%5B%5D=ids_error&perPage=1000&productType=appliance&startingAfter=2023-02-02T13:40:14.473981Z'))>)>

This time it doesn't break from the while loop, 'direction' switches to 'prev', and does another get inside the while loop, it's now stuck in the while loop.

To me it looks like it failed to handle the response from the initial get correctly, and instead made further API calls eventually resulting in the loop.

coreGreenberet commented 1 year ago

I might need a second look at this. How are you calling _get_pages_legacy the second time? There isn't any recursion in the api, so the second call must come from your end. Also the URL Parameters got extended in your second call.

jc516229 commented 1 year ago

I'm not calling _get_pages_legacy myself, that's being done by the library when I call aiomeraki.networks.getNetworkEvents.

I just make this library call normally as far as I can see. When I added in some print statements to try and figure out exactly what was happening, I found this behaviour.

The library call I make...

response = await aiomeraki.networks.getNetworkEvents(net['id'], productType="appliance", includedEventTypes = ["nbar_block", "cf_block", "sf_url_block", "sf_binary_block", "ids_start", "ids_update", "ids_error"], perPage=1000, total_pages="all")

I've verified that I only make this call once.

jc516229 commented 1 year ago

Ok, right, just saw I'm confusing things with my own debug prints!

In https://github.com/meraki/dashboard-api-python/issues/200#issuecomment-1413819176 the first calls to _get_pages_legacy are from other calls I make before I make the call to getNetworkEvents

The one time that getNetworkEvents is called, the API return is...

'<MultiDictProxy('prev': <MultiDictProxy('rel': 'prev', 'url': URL('https://n383.meraki.com/api/v1/networks/L_681169443639799490/events?endingBefore=2023-01-06T12:55:57.000000Z&includedEventTypes%5B%5D=nbar_block&includedEventTypes%5B%5D=cf_block&includedEventTypes%5B%5D=sf_url_block&includedEventTypes%5B%5D=sf_binary_block&includedEventTypes%5B%5D=ids_start&includedEventTypes%5B%5D=ids_update&includedEventTypes%5B%5D=ids_error&perPage=1000&productType=appliance'))>, 'next': <MultiDictProxy('rel': 'next', 'url': URL('https://n383.meraki.com/api/v1/networks/L_681169443639799490/events?includedEventTypes%5B%5D=nbar_block&includedEventTypes%5B%5D=cf_block&includedEventTypes%5B%5D=sf_url_block&includedEventTypes%5B%5D=sf_binary_block&includedEventTypes%5B%5D=ids_start&includedEventTypes%5B%5D=ids_update&includedEventTypes%5B%5D=ids_error&perPage=1000&productType=appliance&startingAfter=2023-02-06T12:55:57.221504Z'))>)>'

Which has no events and has the spurious 'next' header, it also sets direction to 'prev' (not sure how/where that happens.)

Then it enters the while loop with direction 'prev' and gets stuck, on each loop it decrements total_pages, so the exit condition of while total_pages != 1: is never met.

The ending_before date in the API response in this loop initially moves back in time, one month per loop, but once it reaches... endingBefore=2019-06-17T07%3A00%3A00.000000Z

...it goes no further, all subsequent responses have this date, I'd guess this is the creation date of the organization.

So the failsafe test in the code...

if ending_before < "2014-01-01": break ...never gets triggered.

To confirm this, I changed the failsafe date to 2020-01-01 and it looped until that was reached then broke out of the loop.

To prevent the loop in the first place, if at line 402 I add this...

elif ( type(results) == dict and metadata["operation"] == "getNetworkEvents" and direction == "prev" ): if len(results["events"]) == 0: return results

Then it detects the initial empty events and returns ok.

I did a few quick tests on orgs with/without events, it seems ok.