Closed jlixfeld closed 4 years ago
There will be another problem with this configuration too - it won't commit on Junos due to max prefix-list entries (85,325) being exceeded. Probably one for another issue, but it will crop us as soon as the timeout is fixed!
I'm also seeing this problem with a large number (78) of smaller AS sets (I removed the very large ones - >2000 prefixes).
This issue is probably caused by the timeout of the WSGI process and not the code itself.
I mean the code is responsible because it's taking too long to execute according to the WSGI process. To fix that the code needs to be sped up but since it just spawns a bgpq3 process and waits for its answer I'm not sure how to proceed without some massive caching mechanism.
To summarize, whatever the length of the AS-SET it's the bgpq3 execution time that wil cause the issue.
A "fix" could be to increase the timeout of the WSGI process. In this way it will wait a little bit longer for the feedback from the python code before getting killed.
Regarding the number of entries in a prefix-list for JUNOS I guess this can be addressed in the template itself by controlling the for
loop.
I can confirm that setting:
timeout = 300
in gunicorn_config.py resolves the issue for me.
We are having the same issue, too. That's why we currently update the Prefix-Lists referenced in our policies via a Ansible module.
Using the combination of the WSGI timeout increase, caching prefixes inside the database to minimize network I/O and using Redis as caching mechanism should make this issue bearable.
To summarize and minimize the time it takes to generate a config and avoid errors, you can:
Environment
Steps to Reproduce
Click
Configuration
on IX that has a peer with a large as-set (ie: As-HURRICANE) using peering-manager defaults and simple template example:Expected Behavior
Observed Behavior
After 30 seconds, 502 Proxy Error is thrown:
Running bgpq3 from command line takes 17 seconds and returns ~95,000 lines.