nginxinc / nginx-amplify-agent

NGINX Amplify Agent
BSD 2-Clause "Simplified" License
23 stars 9 forks source link

agent cant keep up with the workload #5

Open vietcgi opened 7 years ago

vietcgi commented 7 years ago

2017-03-22 17:39:26,985 [1279] nginx_alog 289c2dc71ed4088301ef95700346af281770ae903fcfc30541db6eae90e84842 processed 751703 lines from file:/var/log/nginx/access.log 2017-03-22 17:39:26,985 [1279] nginx_alog 289c2dc71ed4088301ef95700346af281770ae903fcfc30541db6eae90e84842 collect in 44.917

our log config:

log_format main_ext '$remote_addr - $remote_user [$time_local]"$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" rt=$request_time rt="$upstream_response_time"';

our agent is 100% cpu usage all time. I wonder if there is a way to limit the lines in nginx_alog.

Thanks

dedm commented 7 years ago

@vietcgi interesting, is it a high-load system? how many requests per second do you have? can you share more lines from the agent log?

vietcgi commented 7 years ago

Hi dedm,

these servers are not high load at all but they usually do 10 to 20k qps. I did tweak the logs a bit but the agent still can't keep up with the requests.

Here is the current log format.

log_format main_ext '$request_time $status $body_bytes_sent rt=$request_time rt=$upstream_response_time';

2017-03-23 13:01:44,697 [45758] nginx_metrics get http://127.0.0.1:80/nginx_status 200 10 133 0.004 2017-03-23 13:01:44,701 [45758] nginx_metrics 27dfd19218a2b8e4a26b8abcb9b270ffcc85f763f23a0ecadd080afbc1f0f0ac collect in 0.033 2017-03-23 13:01:44,703 [45758] supervisor system objects: ['f0635064c70242d6b71e824065384fa8ebeb025ba0a3003dcd98e0968ca4415d'] 2017-03-23 13:01:44,757 [45758] supervisor ps nginx output: ['22387 1 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf', '45590 22387 nginx: worker process', '45591 22387 nginx: worker process', '45592 22387 nginx: worker process', '45593 22387 nginx: worker process', '45594 22387 nginx: worker process', '45595 22387 nginx: worker process', '45596 22387 nginx: worker process', '45597 22387 nginx: worker process', '45598 22387 nginx: worker process', '45599 22387 nginx: worker process', '45600 22387 nginx: worker process', '45601 22387 nginx: worker process', '45602 22387 nginx: worker process', '45603 22387 nginx: worker process', '45604 22387 nginx: worker process', '45605 22387 nginx: worker process', '45606 22387 nginx: worker process', '45607 22387 nginx: worker process', '45608 22387 nginx: worker process', '45609 22387 nginx: worker process', '45610 22387 nginx: worker process', '45611 22387 nginx: worker process', '45612 22387 nginx: worker process', '45613 22387 nginx: worker process', '45614 22387 nginx: worker process', '45615 22387 nginx: worker process', '45616 22387 nginx: worker process', '45617 22387 nginx: worker process', '45618 22387 nginx: worker process', '45619 22387 nginx: worker process', '45620 22387 nginx: worker process', '45621 22387 nginx: worker process', ''] 2017-03-23 13:01:44,780 [45758] supervisor nginx objects: ['27dfd19218a2b8e4a26b8abcb9b270ffcc85f763f23a0ecadd080afbc1f0f0ac'] 2017-03-23 13:01:44,780 [45758] supervisor plus objects: [] 2017-03-23 13:02:12,540 [45758] nginx_alog 27dfd19218a2b8e4a26b8abcb9b270ffcc85f763f23a0ecadd080afbc1f0f0ac processed 635936 lines from file:/var/log/nginx/access.log 2017-03-23 13:02:12,540 [45758] nginx_alog 27dfd19218a2b8e4a26b8abcb9b270ffcc85f763f23a0ecadd080afbc1f0f0ac collect in 24.873 2017-03-23 13:02:12,594 [45758] nginx_elog 27dfd19218a2b8e4a26b8abcb9b270ffcc85f763f23a0ecadd080afbc1f0f0ac processed 1945 lines from file:/var/log/nginx/error.log 2017-03-23 13:02:12,594 [45758] nginx_elog 27dfd19218a2b8e4a26b8abcb9b270ffcc85f763f23a0ecadd080afbc1f0f0ac collect in 0.052 2017-03-23 13:02:12,594 [45758] nginx_alog 27dfd19218a2b8e4a26b8abcb9b270ffcc85f763f23a0ecadd080afbc1f0f0ac processed 0 lines from file:/dev/null 2017-03-23 13:02:12,594 [45758] nginx_alog 27dfd19218a2b8e4a26b8abcb9b270ffcc85f763f23a0ecadd080afbc1f0f0ac collect in 0.000 2017-03-23 13:02:12,594 [45758] supervisor system objects: ['f0635064c70242d6b71e824065384fa8ebeb025ba0a3003dcd98e0968ca4415d'] 2017-03-23 13:02:12,646 [45758] supervisor ps nginx output: ['22387 1 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf', '45590 22387 nginx: worker process', '45591 22387 nginx: worker process', '45592 22387 nginx: worker process', '45593 22387 nginx: worker process', '45594 22387 nginx: worker process', '45595 22387 nginx: worker process', '45596 22387 nginx: worker process', '45597 22387 nginx: worker process', '45598 22387 nginx: worker process', '45599 22387 nginx: worker process', '45600 22387 nginx: worker process', '45601 22387 nginx: worker process', '45602 22387 nginx: worker process', '45603 22387 nginx: worker process', '45604 22387 nginx: worker process', '45605 22387 nginx: worker process', '45606 22387 nginx: worker process', '45607 22387 nginx: worker process', '45608 22387 nginx: worker process', '45609 22387 nginx: worker process', '45610 22387 nginx: worker process', '45611 22387 nginx: worker process', '45612 22387 nginx: worker process', '45613 22387 nginx: worker process', '45614 22387 nginx: worker process', '45615 22387 nginx: worker process', '45616 22387 nginx: worker process', '45617 22387 nginx: worker process', '45618 22387 nginx: worker process', '45619 22387 nginx: worker process', '45620 22387 nginx: worker process', '45621 22387 nginx: worker process', ''] 2017-03-23 13:02:12,671 [45758] supervisor nginx objects: ['27dfd19218a2b8e4a26b8abcb9b270ffcc85f763f23a0ecadd080afbc1f0f0ac'] 2017-03-23 13:02:12,671 [45758] supervisor plus objects: [] 2017-03-23 13:02:12,700 [45758] sys_metrics f0635064c70242d6b71e824065384fa8ebeb025ba0a3003dcd98e0968ca4415d collect in 0.028 2017-03-23 13:02:12,701 [45758] nginx_config found 4 files for /etc/nginx/nginx.conf 2017-03-23 13:02:12,701 [45758] nginx_config found 3 directories for /etc/nginx/nginx.conf 2017-03-23 13:02:12,701 [45758] nginx_config 27dfd19218a2b8e4a26b8abcb9b270ffcc85f763f23a0ecadd080afbc1f0f0ac collect in 0.001 2017-03-23 13:02:12,719 [45758] nginx_metrics Active connections: 10499 server accepts handled requests 264853540 264853540 1551653225 Reading: 0 Writing: 668 Waiting: 9831

2017-03-23 12:50:42,041 [45758] nginx_metrics get http://127.0.0.1:80/nginx_status 200 10 133 0.003 2017-03-23 12:50:42,044 [45758] nginx_metrics 27dfd19218a2b8e4a26b8abcb9b270ffcc85f763f23a0ecadd080afbc1f0f0ac collect in 0.022 2017-03-23 12:50:46,997 [45758] supervisor system objects: ['f0635064c70242d6b71e824065384fa8ebeb025ba0a3003dcd98e0968ca4415d'] 2017-03-23 12:50:47,072 [45758] supervisor ps nginx output: ['22387 1 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf', '45590 22387 nginx: worker process', '45591 22387 nginx: worker process', '45592 22387 nginx: worker process', '45593 22387 nginx: worker process', '45594 22387 nginx: worker process', '45595 22387 nginx: worker process', '45596 22387 nginx: worker process', '45597 22387 nginx: worker process', '45598 22387 nginx: worker process', '45599 22387 nginx: worker process', '45600 22387 nginx: worker process', '45601 22387 nginx: worker process', '45602 22387 nginx: worker process', '45603 22387 nginx: worker process', '45604 22387 nginx: worker process', '45605 22387 nginx: worker process', '45606 22387 nginx: worker process', '45607 22387 nginx: worker process', '45608 22387 nginx: worker process', '45609 22387 nginx: worker process', '45610 22387 nginx: worker process', '45611 22387 nginx: worker process', '45612 22387 nginx: worker process', '45613 22387 nginx: worker process', '45614 22387 nginx: worker process', '45615 22387 nginx: worker process', '45616 22387 nginx: worker process', '45617 22387 nginx: worker process', '45618 22387 nginx: worker process', '45619 22387 nginx: worker process', '45620 22387 nginx: worker process', '45621 22387 nginx: worker process', ''] 2017-03-23 12:50:47,102 [45758] supervisor nginx objects: ['27dfd19218a2b8e4a26b8abcb9b270ffcc85f763f23a0ecadd080afbc1f0f0ac'] 2017-03-23 12:50:47,102 [45758] supervisor plus objects: [] 2017-03-23 12:51:53,060 [45758] nginx_alog 27dfd19218a2b8e4a26b8abcb9b270ffcc85f763f23a0ecadd080afbc1f0f0ac processed 1539963 lines from file:/var/log/nginx/access.log 2017-03-23 12:51:53,061 [45758] nginx_alog 27dfd19218a2b8e4a26b8abcb9b270ffcc85f763f23a0ecadd080afbc1f0f0ac collect in 61.213 2017-03-23 12:51:53,155 [45758] nginx_elog 27dfd19218a2b8e4a26b8abcb9b270ffcc85f763f23a0ecadd080afbc1f0f0ac processed 3462 lines from file:/var/log/nginx/error.log 2017-03-23 12:51:53,155 [45758] nginx_elog 27dfd19218a2b8e4a26b8abcb9b270ffcc85f763f23a0ecadd080afbc1f0f0ac collect in 0.093 2017-03-23 12:51:53,155 [45758] nginx_alog 27dfd19218a2b8e4a26b8abcb9b270ffcc85f763f23a0ecadd080afbc1f0f0ac processed 0 lines from file:/dev/null 2017-03-23 12:51:53,155 [45758] nginx_alog 27dfd19218a2b8e4a26b8abcb9b270ffcc85f763f23a0ecadd080afbc1f0f0ac collect in 0.000 2017-03-23 12:51:53,155 [45758] supervisor system objects: ['f0635064c70242d6b71e824065384fa8ebeb025ba0a3003dcd98e0968ca4415d'] 2017-03-23 12:51:53,201 [45758] supervisor ps nginx output: ['22387 1 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf', '45590 22387 nginx: worker process', '45591 22387 nginx: worker process', '45592 22387 nginx: worker process', '45593 22387 nginx: worker process', '45594 22387 nginx: worker process', '45595 22387 nginx: worker process', '45596 22387 nginx: worker process', '45597 22387 nginx: worker process', '45598 22387 nginx: worker process', '45599 22387 nginx: worker process', '45600 22387 nginx: worker process', '45601 22387 nginx: worker process', '45602 22387 nginx: worker process', '45603 22387 nginx: worker process', '45604 22387 nginx: worker process', '45605 22387 nginx: worker process', '45606 22387 nginx: worker process', '45607 22387 nginx: worker process', '45608 22387 nginx: worker process', '45609 22387 nginx: worker process', '45610 22387 nginx: worker process', '45611 22387 nginx: worker process', '45612 22387 nginx: worker process', '45613 22387 nginx: worker process', '45614 22387 nginx: worker process', '45615 22387 nginx: worker process', '45616 22387 nginx: worker process', '45617 22387 nginx: worker process', '45618 22387 nginx: worker process', '45619 22387 nginx: worker process', '45620 22387 nginx: worker process', '45621 22387 nginx: worker process', ''] 2017-03-23 12:51:53,234 [45758] supervisor nginx objects: ['27dfd19218a2b8e4a26b8abcb9b270ffcc85f763f23a0ecadd080afbc1f0f0ac'] 2017-03-23 12:51:53,234 [45758] supervisor plus objects: [] 2017-03-23 12:51:53,571 [45758] supervisor {u'config': {u'cloud': {u'push_interval': 60.0, u'talk_interval': 120.0, u'api_url': u'https://receiver.amplify.nginx.com:443/1.3', u'api_timeout': 10.0}, u'containers': {u'nginx': {u'max_test_duration': 30.0, u'parse_delay': 60.0, u'upload_config': True, u'upload_ssl': True, u'run_test': True, u'poll_intervals': {u'metrics': 20.0, u'configs': 20.0, u'meta': 60.0, u'logs': 10.0, u'discover': 10.0}}, u'plus': {u'poll_intervals': {u'metrics': 20.0, u'meta': 60.0, u'discover': 10.0}}, u'system': {u'poll_intervals': {u'metrics': 20.0, u'meta': 60.0, u'discover': 10.0}}}}, u'objects': [], u'messages': [], u'versions': {u'current': 0.41, u'old': 0.37, u'obsolete': 0.32}} 2017-03-23 12:51:53,571 [45758] supervisor post https://receiver.amplify.nginx.com:443/1.3/xxxx/agent/ 200 89 4 0.336 2017-03-23 12:51:53,603 [45758] sys_metrics f0635064c70242d6b71e824065384fa8ebeb025ba0a3003dcd98e0968ca4415d collect in 0.031 2017-03-23 12:51:53,604 [45758] nginx_config found 4 files for /etc/nginx/nginx.conf 2017-03-23 12:51:53,604 [45758] nginx_config found 3 directories for /etc/nginx/nginx.conf 2017-03-23 12:51:53,604 [45758] nginx_config 27dfd19218a2b8e4a26b8abcb9b270ffcc85f763f23a0ecadd080afbc1f0f0ac collect in 0.001 2017-03-23 12:51:53,627 [45758] nginx_metrics Active connections: 13270 server accepts handled requests 262804322 262804322 1542147912 Reading: 0 Writing: 1195 Waiting: 12075

dedm commented 7 years ago

Thanks, @vietcgi. Well, you have a high-load system and it looks like we should re-engineer our logic for logs parsing. Right now it's pure python and regexes and it can't be super fast, especially when an agent has to parse 1-2 mln lines...

So, again, thanks a lot for this case. We will let you know when we find a solution.

vietcgi commented 7 years ago

that makes sense. Let me know if you need to test the new solution.

Thanks