wazuh / wazuh-api

Wazuh - RESTful API
https://wazuh.com
GNU General Public License v2.0
69 stars 57 forks source link

Possible race condition with API log rotation and API calls #485

Open davidjiglesias opened 4 years ago

davidjiglesias commented 4 years ago

Hello team,

I have been studying a possible race condition occurring in the API when some API calls are made at the exact same time the API is rotating its log at midnight. When that happens, the API receives a SIGTERM and terminates.

Based on these logs provided by one of our users:

root@wazuh-manager:/var/ossec/logs# cat api.log
WazuhAPI 2020-05-06 00:00:00 foo: Rotated: /var/ossec/logs/api/2020/May/api-06-1.gz
WazuhAPI 2020-05-06 00:00:00 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 00:00:00 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 00:00:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 00:00:01 foo: Exiting... (SIGTERM)
WazuhAPI 2020-05-06 06:08:27 : Listening on: https://:::55000
WazuhAPI 2020-05-06 06:08:35 foo: [::ffff:192.168.0.4] GET /manager/info? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:36 foo: [::ffff:192.168.0.4] GET /version? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:45 foo: [::ffff:192.168.0.4] GET /manager/info? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/?limit=1&sort=-dateAdd&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/summary? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/groups? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/stats/distinct?fields=version&select=version - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/stats/distinct?fields=node_name&select=node_name - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/stats/distinct?fields=os.name%2Cos.platform%2Cos.version&select=os.name%2Cos.platform%2Cos.version - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:48 foo: [::ffff:192.168.0.4] GET /agents/?limit=500&offset=0 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:15:00 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:15:01 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:15:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:30:00 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:30:00 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:30:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:45:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:45:01 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:45:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:00:00 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:00:00 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 07:00:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:15:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:15:01 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 07:15:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:30:00 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:30:01 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 07:30:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.

We can see there are three different calls (GET /agents?limit=1, GET /cluster/status and GET /agents) scheduled by the Kibana cronjob every 15 minutes, and they are occurring at exact hours in this case (07:00, 07:15, 07:30, etc). They do not always occur at exact hours, it depends on first execution.

The API log rotation occurs exactly at 00:00 and will concur in this case with some of those calls.

We should further investigate the issue based on these premises.

Best regards,

David J. Iglesias