Closed juliamagan closed 2 years ago
While the issues were being created, it was possible to see that the same errors message were showing in different tests. But the method involved was different, so when starting to work, you should synchronize with those involved in the different issues that affect the same error message.
I add the list of error messages related to multiple issues.
Error messages | Issues |
---|---|
Could not find the remoted key loading log |
https://github.com/wazuh/wazuh-qa/issues/1572 - https://github.com/wazuh/wazuh-qa/issues/1578 - https://github.com/wazuh/wazuh-qa/issues/1576 - https://github.com/wazuh/wazuh-qa/issues/1574 |
General Research: Done. It allowed creating each fail contained on this issue.
I did a full run of the remoted tests on a branch that combines the changes on the https://github.com/wazuh/wazuh-qa/pull/1625 and those on the https://github.com/wazuh/wazuh-qa/pull/1624.
The progress went much further than it was before but there were failures in a test that in fact had no failure cases in https://github.com/wazuh/wazuh-qa/issues/1573. We are going to have to take a closer look at the tests test_remoted/test_configuration
as there seem to be some flaky tests as well. For example:
test_basic_configuration_connection_invalid_protocol
pass in my local environment but in issue https://github.com/wazuh/wazuh-qa/issues/1573 it is reported as failing.test_basic_configuration_connection_valid
pass in my local environment but in issue https://github.com/wazuh/wazuh-qa/issues/1573 it is reported as failing.test_basic_configuration_ipv6
fails in my local environment but the issue https://github.com/wazuh/wazuh-qa/issues/1573 does not report any failures.I performed full remoted
tests executions in order to determine the global status of the tests of this component including the fixes in #1624 and #1625. I got some intermittent failures that should be analyzed.
Test Executions | Date | By | Status |
---|---|---|---|
test_remoted-r1.log | 2021-07-22 | Diego | :yellow_circle: |
test_remoted-r2.log | 2021-07-22 | Diego | test_syslog_message.py::test_syslog_message :red_circle: |
test_remoted-r3.log | 2021-07-22 | Diego | test_agents_switching_protocols.py::test_agents_switching_protocols :red_circle: |
Used Wazuh-QA branch: 1570-test-socket-communication
Test results with the default settings in the ossec.conf
:
Test Executions | Date | By | Status |
---|---|---|---|
test_remoted_local_r1.log | 2021-07-27 | Miguel | :yellow_circle: |
test_remoted_local_r2.log | 2021-07-27 | Miguel | :red_circle: |
test_remoted_local_r3.log | 2021-07-27 | Miguel | :yellow_circle: |
The following errors happened in the second running:
FAILED test_remoted/test_agent_communication/test_multi_agent_protocols_communication.py::test_multi_agents_protocols_communication[UDP_56000]
FAILED test_remoted/test_agent_communication/test_multi_agent_protocols_communication.py::test_multi_agents_protocols_communication[TCP,UDP_56000]
The origin of both errors is the same:
logger.error(f"Results expected: {accum_results}")
> raise TimeoutError(error_message)
E TimeoutError: Could not find the expected event in queue socket
/usr/local/lib/python3.6/site-packages/wazuh_testing/tools/monitoring.py:456: TimeoutError
I performed full remoted
tests executions in order to determine the global status of the tests of this component including the fixes in #1662. I got some intermittent failures that should be analyzed.
Test Executions | Date | By | Status |
---|---|---|---|
test_remoted-r1.log | 2021-07-28 | Diego | test_agent_communication/test_multi_agent_protocols_communication.py :red_circle: |
test_remoted-r2.log | 2021-07-28 | Diego | :yellow_circle: |
test_remoted-r3.log | 2021-07-28 | Diego | :yellow_circle: |
After merging #1662 the fail test_agents_switching_protocols.py
reported by Diego was fixed.
I show the results:
Test Executions | Date | By | Status |
---|---|---|---|
ResultsAgentCommunication.log | 2021-07-28 | Seyla | 🟡 |
ResultsAgentCommunication2.log | 2021-07-28 | Seyla | 🟡 |
ResultsAgentCommunication3.log | 2021-07-28 | Seyla | 🟡 |
Used Wazuh-QA branch: 1530-full-yellow-remoted
Test results with the default settings in the ossec.conf
:
Test Executions | Date | By | Status |
---|---|---|---|
test_remoted_local_r1.log | 2021-07-28 | Miguel | :yellow_circle: |
test_remoted_local_r2.log | 2021-07-28 | Miguel | :yellow_circle: |
test_remoted_local_r3.log | 2021-07-28 | Miguel | :yellow_circle: |
Test Results:
Test Executions | Date | By | Status |
---|---|---|---|
FullYellowRemoted.log | 2021-07-28 | Seyla | 🔴 ( https://github.com/wazuh/wazuh-qa/issues/1573) |
FullYellowRemoted2.log | 2021-07-28 | Seyla | :yellow_circle: |
FullYellowRemoted3.log | 2021-07-28 | Seyla | 🔴 (https://github.com/wazuh/wazuh-qa/issues/1575) |
Note: Each test run alone works successfully and therefore is closed, but we can see that they failed in 2 random executions when I execute the Remoted cycle complete.
Used Wazuh-QA branch: 1571-fix-socket-error-in-ar-test
Test results with the default settings in the ossec.conf
:
Test Executions | Date | By | Status |
---|---|---|---|
test_remoted_local_r1.log | 2021-07-30 | Miguel | :red_circle: |
test_remoted_local_r2.log | 2021-07-30 | Miguel | :red_circle: |
test_remoted_local_r3.log | 2021-07-30 | Miguel | :red_circle: |
The same test has failed in all rounds:
=========================== short test summary info ============================
FAILED test_remoted/test_configuration/test_basic_configuration_connection_invalid_protocol.py::test_invalid_protocol[Testing,UDP_secure_1514]
FAILED test_remoted/test_configuration/test_basic_configuration_connection_invalid_protocol.py::test_invalid_protocol[TCP,Testing_secure_1514]
FAILED test_remoted/test_configuration/test_basic_configuration_connection_invalid_protocol.py::test_invalid_protocol[TCP,UDP,Testing_secure_1514]
FAILED test_remoted/test_configuration/test_basic_configuration_connection_invalid_protocol.py::test_invalid_protocol[Testing,UDP_syslog_514]
Therefore I have run it standalone using the same branch: | Test Executions | Date | By | Status |
---|---|---|---|---|
test_basic_configuration_connection_invalid_protocol.py_local_r1.log | 2021-07-30 | Miguel | :red_circle: | |
test_basic_configuration_connection_invalid_protocol.py_local_r2.log | 2021-07-30 | Miguel | :red_circle: | |
test_basic_configuration_connection_invalid_protocol.py_local_r3.log | 2021-07-30 | Miguel | :red_circle: |
After confirming that errors also occur I ran it again using the 1530-full-yellow-remoted branch: |
Test Executions | API log | Date | By | Status |
---|---|---|---|---|---|
test_basic_configuration_connection_invalid_protocol.py_local_r1.log | api_r1.log | 2021-07-30 | Miguel | :red_circle: | |
test_basic_configuration_connection_invalid_protocol.py_local_r2.log | api_r2.log | 2021-07-30 | Miguel | :red_circle: | |
test_basic_configuration_connection_invalid_protocol.py_local_r3.log | api_r3.log | 2021-07-30 | Miguel | :red_circle: |
The failure is repeated every time. Although this test appears in the issue https://github.com/wazuh/wazuh-qa/issues/1573 (case 1), the error detected is different:
def get_manager_configuration(section=None, field=None):
"""Get Wazuh manager configuration response from API using GET /manager/configuration
.
.
.
response = requests.get(api_query, headers=api_details['auth_headers'], verify=False)
> assert response.json()['error'] == 0, f"Wazuh API response status different from 0: {response.json()}"
E AssertionError: Wazuh API response status different from 0: {'title': 'Wazuh Internal Error', 'detail': 'Timeout executing API request', 'dapi_errors': {'node01': {'error': 'Timeout executing API request', 'logfile': 'WAZUH_HOME/logs/api.log'}}, 'error': 3021}
/usr/local/lib/python3.6/site-packages/wazuh_testing/api.py:128: AssertionError
Since the problem seems to be API related I attach these logs.
It is the last execution with all changes.
Test Executions | Date | By | Status |
---|---|---|---|
FullYellowRemoted.log | 2021-07-30 | Seyla | 🟡 |
FullYellowRemoted2.log | 2021-07-28 | Seyla | 🟡 |
FullYellowRemoted3.log | 2021-07-28 | Seyla | 🟡 |
Used Wazuh-QA branch: 1530-full-yellow-remoted
Test results with the default settings in the ossec.conf
:
Test Executions | Date | By | Status |
---|---|---|---|
test_remoted-full_r1.log | 2021-07-30 | Diego | :red_circle: |
test_remoted-full_r2.log | 2021-07-30 | Diego | :yellow_circle: |
test_remoted-full_r3.log | 2021-07-30 | Diego | :yellow_circle: |
The first round shows failures that should ve analyzed in the next tests:
FAILED wazuh-qa/tests/integration/test_remoted/test_agent_communication/test_multi_agent_protocols_communication.py::test_multi_agents_protocols_communication[UDP_1514]
FAILED wazuh-qa/tests/integration/test_remoted/test_socket_communication/test_syslog_message.py::test_syslog_message[syslog_UDP_514-dummy_bad_formatted_pri_header]
I have launched three rounds of test_remoted/test_configuration/test_basic_configuration_connection_invalid_protocol.py
and everything seems to work properly.
CentOS 8 Manager | Reports | |
---|---|---|
R1 | 🟡 | Test results |
R2 | 🟡 | Test results |
R3 | 🟡 | Test results |
Used Wazuh-QA branch: 1530-full-yellow-remoted
Test Executions | By | Status |
---|---|---|
FullRemoted.log | Seyla | 🔴 |
FullRemoted2.log | Seyla | 🔴 |
FullRemoted3.log | Seyla | 🔴 |
Wazuh-QA branch: 1530-full-yellow-remoted
The test test_remoted/test_socket_communication/test_syslog_message.py
was run isolated on a clean environment many times, but all the tests were successful
Test Executions | Date | By | Status |
---|---|---|---|
test_syslog_message_1.log | 2021-08-02 | Matias | :yellow_circle: |
test_syslog_message_2.log | 2021-08-02 | Matias | :yellow_circle: |
test_syslog_message_3.log | 2021-08-02 | Matias | :yellow_circle: |
test_syslog_message_4.log | 2021-08-02 | Matias | :yellow_circle: |
test_syslog_message_5.log | 2021-08-02 | Matias | :yellow_circle: |
test_syslog_message_6.log | 2021-08-02 | Matias | :yellow_circle: |
test_syslog_message_7.log | 2021-08-02 | Matias | :yellow_circle: |
test_syslog_message_8.log | 2021-08-02 | Matias | :yellow_circle: |
test_syslog_message_9.log | 2021-08-02 | Matias | :yellow_circle: |
test_syslog_message_10.log | 2021-08-02 | Matias | :yellow_circle: |
These are the results of the tests with the whole test_remoted folder.
Test Executions | Date | By | Status |
---|---|---|---|
test_remoted_1.log | 2021-08-02 | Matias | :red_circle: |
test_remoted_2.log | 2021-08-02 | Matias | :red_circle: |
test_remoted_3.log | 2021-08-02 | Matias | :yellow_circle: |
test_remoted_4.log | 2021-08-02 | Matias | :yellow_circle: |
test_remoted_5.log | 2021-08-02 | Matias | :yellow_circle: |
test_remoted_6.log | 2021-08-02 | Matias | :yellow_circle: |
test_remoted_7.log | 2021-08-02 | Matias | :yellow_circle: |
test_remoted_8.log | 2021-08-02 | Matias | :red_circle: |
test_remoted_9.log | 2021-08-02 | Matias | :red_circle: |
test_remoted_10.log | 2021-08-02 | Matias | :yellow_circle: |
test_multi_agent_protocols_communication.py::test_multi_agents_protocols_communication[UDP_1514]
test_multi_agent_protocols_communication.py::test_multi_agents_protocols_communication[TCP_1514]
test_multi_agent_protocols_communication.py::test_multi_agents_protocols_communication[TCP_1514]
test_multi_agent_protocols_communication.py::test_multi_agents_protocols_communication[TCP,UDP_56000]
The remoted folder was tested again, but this time without the -x
parameter, so the test won't stop if it fails
Test Executions | Date | By | Status |
---|---|---|---|
test_remoted_rerun_1.log | 2021-08-02 | Matias | :red_circle: |
test_remoted_rerun_2.log | 2021-08-02 | Matias | :yellow_circle: |
test_remoted_rerun_3.log | 2021-08-02 | Matias | :yellow_circle: |
test_remoted_rerun_4.log | 2021-08-02 | Matias | :yellow_circle: |
test_remoted_rerun_5.log | 2021-08-02 | Matias | :yellow_circle: |
test_remoted_rerun_6.log | 2021-08-02 | Matias | :yellow_circle: |
test_remoted_rerun_7.log | 2021-08-02 | Matias | :yellow_circle: |
test_remoted_rerun_8.log | 2021-08-02 | Matias | :yellow_circle: |
test_remoted_rerun_9.log | 2021-08-02 | Matias | :yellow_circle: |
test_remoted_rerun_10.log | 2021-08-02 | Matias | :yellow_circle: |
test_multi_agent_protocols_communication.py::test_multi_agents_protocols_communication[UDP_1514]
Used Wazuh-QA branch: 1530-full-yellow-remoted |
Test Executions | Status |
---|---|---|
R1 | 🔴 |
I spent some time analyzing the issues in test_multi_agent_protocols_communication.py
reported in the next comments:
In all the executions in which we observe failures, they were caused because the Man In The Middle didn't find the event that it was looking for.
Having said that, I analyzed the MITM implementation and how it is used in these tests. Essentially for these tests the MITM will monitor the UDP socket queue/sockets/queue
. The way in which it will monitor the queue will be:
queue
to queue.original
.queue
socket from the MITM.Considering the fact that analysisd
also reads from that queue, the tests also stop the execution of that daemon durin the test execucion.
From this analysis we can make the next conclusions:
queue
receives many events, it could run into the same issue tan the FileMonitor in with it doesn't find the event before achieving the timeout.We are analyzing the second item to see whether or not is the cause of the issue.
The intermittent failures of test_multi_agents_protocols_communication
may be related to a socket saturation, so the test is unable to read the required message.
Many different approaches are being carried out to read the queue
socket (strace, kernel probes, etc), so we can probe the message is present but the test failed.
Also, even more modules are being removed from the default configuration, to reduce the messages in the socket.
The whole test_remoted folder was run with this new_ossec.conf file. Basically, all the modules are disabled and all the localfile entries were removed. The purpose of this approach is to reduce the socket usage and verify the tests behavior in this situation. Here are the results
Test Executions | Date | By | Status |
---|---|---|---|
test_remoted_no_logcollector_1.log | 2021-08-03 | Matias | :yellow_circle: |
test_remoted_no_logcollector_2.log | 2021-08-03 | Matias | :yellow_circle: |
test_remoted_no_logcollector_3.log | 2021-08-03 | Matias | :yellow_circle: |
test_remoted_no_logcollector_4.log | 2021-08-03 | Matias | :yellow_circle: |
test_remoted_no_logcollector_5.log | 2021-08-03 | Matias | :yellow_circle: |
test_remoted_no_logcollector_6.log | 2021-08-03 | Matias | :yellow_circle: |
test_remoted_no_logcollector_7.log | 2021-08-03 | Matias | :yellow_circle: |
test_remoted_no_logcollector_8.log | 2021-08-03 | Matias | :yellow_circle: |
test_remoted_no_logcollector_9.log | 2021-08-03 | Matias | :yellow_circle: |
test_remoted_no_logcollector_10.log | 2021-08-03 | Matias | :yellow_circle: |
The tests results suggest a more consistent result, but more testing is needed to be sure.
Continuing with the analysis made in https://github.com/wazuh/wazuh-qa/issues/1530#issuecomment-891932778, I modified the Wazuh testing framework to print int the console the events received by the Man In The Middle from the queue
socket. The changes are implemented in the next branch:
https://github.com/wazuh/wazuh-qa/tree/1618-poc-mitm-monitor
After this, I performed multiples executions of the tests in order to reproduce the issue and confirm the thinking that we have related to the QueueMonitor not analyzing the events as fast as possible to avoid the timeout. We didn't manage to reproduce the issue so, the next step will be to run the tests in multiples environments in paralell.
These are the results of the whole remoted tests folder but disabling all the modules including logcollector. This is a re-run of the tests in this comment.
Test Executions | Date | By | Status |
---|---|---|---|
test_remoted_no_logcollector_rerun_1.log | 2021-08-04 | Matias | :yellow_circle: |
test_remoted_no_logcollector_rerun_2.log | 2021-08-04 | Matias | :yellow_circle: |
test_remoted_no_logcollector_rerun_3.log | 2021-08-04 | Matias | :yellow_circle: |
test_remoted_no_logcollector_rerun_4.log | 2021-08-04 | Matias | :yellow_circle: |
test_remoted_no_logcollector_rerun_5.log | 2021-08-04 | Matias | :red_circle: |
test_remoted_no_logcollector_rerun_6.log | 2021-08-04 | Matias | :yellow_circle: |
test_remoted_no_logcollector_rerun_7.log | 2021-08-04 | Matias | :yellow_circle: |
test_remoted_no_logcollector_rerun_8.log | 2021-08-04 | Matias | :yellow_circle: |
test_remoted_no_logcollector_rerun_9.log | 2021-08-04 | Matias | :yellow_circle: |
test_remoted_no_logcollector_rerun_10.log | 2021-08-04 | Matias | :yellow_circle: |
These results may suggest that the socket saturation is a possible cause, but still we can't be sure.
These tests were repeated 18 times, with the same configuration than before minus the authd daemon. It failed 2 times, here are the results: test_remoted_no_logcollector-auth.zip
We are not seeing an improvement in the overall results, we can discard the hypothesis of a socket saturation.
In the other hand, some extra debugging messages were added to the QA framework to analyze the messages that enter the queue and the messages that are read by the monitor.
The test test_multi_agent_protocols_communication.py
was run 252 times, and it failed 39 times.
Here we have the logs queue_test.zip. We can conclude that
These results may indicate an issue in the message inyector and not in the QueueMonitor, because the queue is clearly not flooded.
New debug messages were added to check if the sender threads were really sending the messages.
The test test_multi_agent_protocols_communication.py
was run 41 times, it failed 6 times. Here are the results
multi_agent_debug.zip.
We can conclude that every time the test fails, both agent's threads send the corresponding messages. But they get lost in the way and they don't get to the socket. These results discard a failure in the sending threads.
I performed full executions of the remoted tests with some messages printed in order to have more information in case of failures. Although the issue in the test_multi_agent_protocols_communication.py
was not reproduced, the executions showed issues in the test test_syslog_message.py
that should be analyzed.
Test Executions | Date | By | Status |
---|---|---|---|
remoted-full-r1.log | 2021-08-04 | Diego | :red_circle: |
remoted-full-r2.log | 2021-08-04 | Diego | :red_circle: |
remoted-full-r3.log | 2021-08-04 | Diego | :red_circle: |
remoted-full-r4.log | 2021-08-04 | Diego | :red_circle: |
Test Executions | Date | By | Status |
---|---|---|---|
FullRemoted.log | 2021-08-04 | Seyla | :red_circle: |
FullRemoted2.log | 2021-08-04 | Seyla | :red_circle: |
FullRemoted3.log | 2021-08-04 | Seyla | 🟡 |
Some debugging messages were added to remoted to verify the correct reception and sending of the test messages. Here are the results:
It was possible to see in the failed test that remoted receives both messages, but fails while trying to send one of them to the socket. That is the reason why the QA framework is unable to receive the message.
More tests are being carried out to determine if the ManInTheMiddle approach of the test is responsible for the socket communication error.
These are the messages generated by the test. The threads send the message, but only one of them gets in the queue
wazuh-qa/tests/integration/test_remoted/test_agent_communication/test_multi_agent_protocols_communication.py::test_multi_agents_protocols_communication[UDP_1514] 2021-08-05T13:49:28.1628171368
2021/08/05 13:49:28 wazuh-remoted[547749] debug_op.c:70 at _log(): DEBUG: Logging module auto-initialized
2021/08/05 13:49:28 wazuh-remoted[547749] main.c:130 at main(): DEBUG: Wazuh home directory: /var/ossec
2021/08/05 13:49:28 wazuh-remoted[547749] main.c:148 at main(): DEBUG: This is not a worker
2021-08-05 13:49:59,001 - wazuh_testing - ERROR - Could not find the expected event in queue socket
2021-08-05 13:49:59,002 - wazuh_testing - ERROR - Results accumulated: 0
2021-08-05 13:49:59,002 - wazuh_testing - ERROR - Results expected: 1
Agent simulator sending msg: b'!11075!#AES:\x18^\xe2\xdee\xc9S\x9f\xe2F\xafp\x1b\x9dw\x8c+\xdf@\xc9>\x10iv\xd5\xa4\xffR2~\x9f\x18Y\\\xe3\xd6\xf4\x9b\xad \r\xbd-\xdcD-\xc1\x07/\xd7\x0eg\xeb\x1fJ\xfd\x80\x95\xbe\xcf\x0f\xe2h\xb5\xe6\xf7\x92R\xad\xca.\x0c\x10\xb1!\x98\xb6\r\xba\xce\x02\n\x81\xcf\xa39\x91\xac\xf8P\xc8\x96\xb9\x93l\xe4\xbe*t\xcbU\x04\\6\xb4\xa3F\x06G\x88\xd2^\xc7\x96i\x89K\x81\xf1\xd1\xc1caS\xad1\xe8\x7f'
Agent simulator sending msg: b"!11076!#AES:\xbc\xa0\xd0\xb3^\xeaZl\x10el\xa1-\x058\xf3\xd0\x9b-\xc7\x83\x9b\\\xc2\xf2GC_\xf6\xbaB1\xb6\xd0x~`\xc9\xcc\xea\xcb\xf1\x04\x85\xa9{<\xfa\xa5\xdb\xa3\xbb\xed\x05{\xde\x16\xdf\xe3D\x10\xcf\x89H\xc2\xa4\xb8\x1f\xce\n\t\x8abBUo\x9d\r'9K\x91\xcdp{&\xa3\xc7\xd3\xa6\xe5\xc6\x03\xdf@\xa3\xfaW\xc2\x0ft,\x89\xae^\xb8\xda\xee\x8d0\x89\x93\x06\xd4(3\xdc-\xddT\xd7\xabI\xbe\xaf\xc2'\x96"
Inserting item in queue: b'1:[11076] (6-2948d7f5-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 11076\x00'
Got msg from queue: b'1:[11076] (6-2948d7f5-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 11076\x00'
FAILED
Were we have the messages printed by remoted related to the agents 11075 and 11076
2021/08/05 13:49:38 wazuh-remoted[547749] secure.c:547 at HandleSecureMessage(): DEBUG: Remoted received: (1:/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 11075) ([11075] (5-6400c8ef-debian7) any)
2021/08/05 13:49:38 wazuh-remoted[547749] secure.c:547 at HandleSecureMessage(): DEBUG: Remoted received: (1:/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 11076) ([11076] (6-2948d7f5-debian7) any)
2021/08/05 13:49:38 wazuh-remoted[547749] os_net.c:506 at OS_SendUnix(): DEBUG: Sending event to queue: (1:[11076] (6-2948d7f5-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 11076) (113)
The message for the agent 11075 is never sent to the queue.
The tests were run again, but some extra debugging messages were added in mq_op.c, os_net.c and secure.c (
modified files.zip
).
The ossec.log
file was saved for every test.
It can be seen that the connection with the socket fails regularly, even in tests that pass. Here is an example: ossec_queue_2.log queue_test_2.log
The messages from the agent 15556 are received and sent right away
2021/08/05 23:45:11 wazuh-remoted[791292] secure.c:547 at HandleSecureMessage(): DEBUG: Remoted received: (1:/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 15556) ([15556] (2-6c8107e9-debian7) any)
2021/08/05 23:45:11 wazuh-remoted[791292] os_net.c:508 at OS_SendUnix(): DEBUG: Sending event to queue: (1:[15556] (2-6c8107e9-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 15556) (113)
2021/08/05 23:45:12 wazuh-remoted[791381] mq_op.c:47 at StartMQ(): DEBUG: Connected succesfully to 'queue/sockets/queue' after 0 attempts
But the agent 15555 needs a re-connection to queue to send it successfully
2021/08/05 23:45:11 wazuh-remoted[791292] secure.c:547 at HandleSecureMessage(): DEBUG: Remoted received: (1:/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 15555) ([15555] (1-39ae3654-debian7) any)
2021/08/05 23:45:11 wazuh-remoted[791292] os_net.c:504 at OS_SendUnix(): DEBUG: Failed sending event to queue (OS_SOCKTERR): (1:[15555] (1-39ae3654-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 15555) (113)
2021/08/05 23:45:11 wazuh-remoted[791292] mq_op.c:93 at SendMSG(): ERROR: socketerr (not available).
2021/08/05 23:45:11 wazuh-remoted[791292] secure.c:556 at HandleSecureMessage(): ERROR: (1210): Queue 'queue/sockets/queue' not accessible: 'Connection refused'
2021/08/05 23:45:11 wazuh-remoted[791292] mq_op.c:47 at StartMQ(): DEBUG: Connected succesfully to 'queue/sockets/queue' after 0 attempts
2021/08/05 23:45:11 wazuh-remoted[791292] secure.c:561 at HandleSecureMessage(): INFO: Successfully reconnected to 'queue/sockets/queue'
2021/08/05 23:45:11 wazuh-remoted[791292] os_net.c:508 at OS_SendUnix(): DEBUG: Sending event to queue: (1:[15555] (1-39ae3654-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 15555) (113)
This behavior is found in every test, and this explains why some messages were found duplicated: it seems that the messages arrives regardless the errors
wazuh-qa/tests/integration/test_remoted/test_agent_communication/test_multi_agent_protocols_communication.py::test_multi_agents_protocols_communication[TCP_1514] 2021/08/05 23:45:01 wazuh-remoted[791292] debug_op.c:70 at _log(): DEBUG: Logging module auto-initialized
2021/08/05 23:45:01 wazuh-remoted[791292] main.c:130 at main(): DEBUG: Wazuh home directory: /var/ossec
2021/08/05 23:45:01 wazuh-remoted[791292] main.c:148 at main(): DEBUG: This is not a worker
Agent simulator sending msg: b'!15555!#AES:Z\x01\xaa\xda\xe1Y\x8bE\x0455\xa7y\x14\x87\xbeq\xdce\xffp\x9e\x7f\xc9\xcd\xb9<\x80\xb5\xbez\xc7\xcah\xf6\xcd\x04\xbc\xe6\xa8\x81\xf0\x9a\xec%Y\xfap\x9e\xf8\xee\x1ei\xbd\xfa L\x06Aqj;2l\x0ew*\xd4\x10\x1b/\xf4\xa5.\x8c6\xce\xdd"\x86@\x7fbZ]\x03\xac`\x9f\xd6\x08A\xf2j\x18\x9e\xbfS\xf8\x8f\x91\x9d\x07\\H\x06\xcf@\x8ak\t\xaa\xf17\xb1\xab\x86)]\xf13\xdd\xd8&\'|`\x15'
Inserting item in queue: b'1:[15555] (1-39ae3654-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 15555\x00'
Agent simulator sending msg: b'!15556!#AES:\xe7\xe0\xe2\t\xccN\xfdg\x81\x1e\xdd\xb4)|\x1eOn\x94_\x13\x1a\xde\xd8(hk\x14\xd0\xcd\xac#\xa2.>\xbai\x90\xef\xd3>\xa6\xb2\xe9l\x16\x88\x96\xabHP\xaa\xe0\x06\xaa&\x851\x99\x1d\x1e\xb90\xd8\xc0\xab\x1b\xd4\x1cs)\x11\x93\xd0\xea,\xccY\xac=\xe7n\xb6G\xafv\x90\xecM\xb5VO\x9e\x84\x19\x10#\x01\xbe\xe9>\xecx~\xbc\x9a\x1f\\\x18j\xae\xf6\xbda\\\x08\xe8\xda\xd7"$a\xa3=\x90\xa1\xe3\x00\x16'
Inserting item in queue: b'1:[15556] (2-6c8107e9-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 15556\x00'
Got msg from queue: b'1:[15555] (1-39ae3654-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 15555\x00'
Got msg from queue: b'1:[15555] (1-39ae3654-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 15555\x00'
Got msg from queue: b'1:[15556] (2-6c8107e9-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 15556\x00'
PASSED
Analyzing a failed case (ossec_queue_87.log, queue_test_87.log) we can see that the message from the agent 16583 is found but the one sent by 16584 is not
wazuh-qa/tests/integration/test_remoted/test_agent_communication/test_multi_agent_protocols_communication.py::test_multi_agents_protocols_communication[TCP,UDP_1514] 2021-08-06T01:21:58.1628212918
2021/08/06 01:21:58 wazuh-remoted[841391] debug_op.c:70 at _log(): DEBUG: Logging module auto-initialized
2021/08/06 01:21:58 wazuh-remoted[841391] main.c:130 at main(): DEBUG: Wazuh home directory: /var/ossec
2021/08/06 01:21:58 wazuh-remoted[841391] main.c:148 at main(): DEBUG: This is not a worker
2021-08-06 01:22:38,684 - wazuh_testing - ERROR - Could not find the expected event in queue socket
2021-08-06 01:22:38,685 - wazuh_testing - ERROR - Results accumulated: 0
2021-08-06 01:22:38,685 - wazuh_testing - ERROR - Results expected: 1
Agent simulator sending msg: b'!16583!#AES:\x84\xfc\xfa\xa9Y\xf4\xa5EC\x80\x90U0\x05\xcaA\xb3\xafO\x96\xcd\xb9\xcf\xef\xc2\x998x7\x8d\x88\x97=\xc1\x04\xd8\xb2\x91C&\xd2\x87\x86c\xabOBH\xc1\x05\x18\x9f\x80\xaeq\xa5 \x0e\xd1XN\xd0\xf2^F\x07\x8d\xc7\xf7B*\xfc\\K\xc2f\xe7F\x17\xd1I\x9a\x12m\x04\x8c<\xea\xfa\x1f\xb4G\x18\x0c\xdc\xb2\rTZ\xfaL\xd0\xc7I\xdb\x83\'\x19\x9ek\x82~G\x80"\xb8\xdb#v\x19\xf3\x16^x)v\xaa\xbe'
Agent simulator sending msg: b'!16584!#AES:X\xd4\xb3\x1cA`\xd3t\xeb+\xeb\x9a\xee=!\xc0\x84\xaf\xd2xj!\xcaL\xbb\xf7\xc9G=\x84\x17\x016\x8f,\xd4\xe2]~1\xf1^\x9e \xd9\xab.\xd0C-\x834aF\xfa\x8b\xf3\xe8\xc5\x0fgk>\xea\x96\x1a\xb2A-\xc1\xc2*\x03\xfa\xa4\xdc\xac\x90U@m\x1f\xd2\xb4\xf1\xec\xdd\x16I\xca\x86%\x17\x0b\xcd,\x01\xe7\xa7#S\xd6\t\xd7\x136\x01\xe8\xb7Y~i\xb4\xc3\xb3d:\xd7GO\x00\xad=\xda>\xc8\x9c\x0e'
Inserting item in queue: b'1:[16583] (9-27f5b5fb-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 16583\x00'
Got msg from queue: b'1:[16583] (9-27f5b5fb-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 16583\x00'
Got msg from queue: b'1:[16583] (9-27f5b5fb-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 16583\x00'
FAILED
But the message to queue is sent by remoted
2021/08/06 01:22:08 wazuh-remoted[841391] secure.c:547 at HandleSecureMessage(): DEBUG: Remoted received: (1:/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 16583) ([16583] (9-27f5b5fb-debian7) any)
2021/08/06 01:22:08 wazuh-remoted[841391] os_net.c:504 at OS_SendUnix(): DEBUG: Failed sending event to queue (OS_SOCKTERR): (1:[16583] (9-27f5b5fb-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 16583) (113)
2021/08/06 01:22:08 wazuh-remoted[841391] mq_op.c:93 at SendMSG(): ERROR: socketerr (not available).
2021/08/06 01:22:08 wazuh-remoted[841391] secure.c:547 at HandleSecureMessage(): DEBUG: Remoted received: (1:/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 16584) ([16584] (10-5079b3c4-debian7) any)
2021/08/06 01:22:08 wazuh-remoted[841391] secure.c:556 at HandleSecureMessage(): ERROR: (1210): Queue 'queue/sockets/queue' not accessible: 'Connection refused'
2021/08/06 01:22:08 wazuh-remoted[841391] os_net.c:508 at OS_SendUnix(): DEBUG: Sending event to queue: (1:[16584] (10-5079b3c4-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 16584) (114)
2021/08/06 01:22:08 wazuh-remoted[841391] mq_op.c:47 at StartMQ(): DEBUG: Connected succesfully to 'queue/sockets/queue' after 0 attempts
2021/08/06 01:22:08 wazuh-remoted[841391] secure.c:561 at HandleSecureMessage(): INFO: Successfully reconnected to 'queue/sockets/queue'
2021/08/06 01:22:08 wazuh-remoted[841391] os_net.c:508 at OS_SendUnix(): DEBUG: Sending event to queue: (1:[16583] (9-27f5b5fb-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 16583) (113)
2021/08/06 01:22:39 wazuh-remoted[841560] mq_op.c:47 at StartMQ(): DEBUG: Connected succesfully to 'queue/sockets/queue' after 0 attempts
The queue interception is unreliable: sometimes the manager fails to send a message but it arrives anyway, and sometimes it successfully sends a message but it never arrives to the socket.
The best option is to change the test structure, and validate the message received by remoted with another method: the agent's status update, an alert generation, etc.
A new issue was created with enhancement proposal #1690.
The debugging process of the failed tests would be much faster if the required information were easily available.
Test Executions | Date | By | Status | Fails |
---|---|---|---|---|
ResultPedroRemoted1.2.log | 11-08-21 | Seyla | 🟡 | |
ResultPedroRemoted1.1.log | 11-08-21 | Seyla | 🟡 | |
ResultPedroRemoted.log | 11-08-21 | Seyla | 🟡 | |
ResultPedroRemoted2.log | 11-08-21 | Seyla | 🔴 | test_agent_communication/test_agent_version_shared_configuration_startup_message |
ResultPedroRemoted3.log | 11-08-21 | Seyla | 🔴 | test_agent_communication/test_request_agent_info.py (related to https://github.com/wazuh/wazuh-qa/issues/1698) test_agent_communication/test_agent_version_shared_configuration_startup_message |
ResultPedroRemoted4.log | 11-08-21 | Seyla | 🟡 | |
FullRemoted.log | 11-08-21 | Seyla | 🟢 | |
Remoted2.log | 11-08-21 | Seyla | 🔴 | test_socket_communication/test_syslog_message |
Remoted3.log | 11-08-21 | Seyla | 🔴 | test_agent_communication/test_agent_version_shared_configuration_startup_message |
Note: Comments deleted was added bellow in a common table.
Test Executions | Date | By | Status |
---|---|---|---|
ResultFullRemoted.log | 09-08-21 | Seyla | 🟡 |
ResultFullRemoted2.log | 09-08-21 | Seyla | 🟡 |
ResultFullRemoted3.log | 09-08-21 | Seyla | 🟡 |
ResultFullRemoted4.log | 09-08-21 | Diego | 🟡 |
ResultFullRemoted5.log | 09-08-21 | Diego | 🟡 |
ResultFullRemoted6.log | 09-08-21 | Diego | 🟡 |
ResultFullRemoted7.log | 09-08-21 | Matias | 🟡 |
ResultFullRemoted8.log | 09-08-21 | Matias | 🟡 |
ResultFullRemoted9.log | 09-08-21 | Matias | 🟡 |
ResultFullRemoted10.log | 10-08-21 | Seyla | 🟡 |
ResultFullRemoted11.log | 10-08-21 | Seyla | 🟡 |
ResultFullRemoted12.log | 10-08-21 | Seyla | 🟡 |
All the remoted tests were run after merging the fix in #1692. All of them passed.
Test Executions | Date | By | Status |
---|---|---|---|
test_remoted-r1.log | 09-08-21 | Diego | :yellow_circle: |
test_remoted-r2.log | 09-08-21 | Diego | :yellow_circle: |
test_remoted-r3.log | 09-08-21 | Diego | :yellow_circle: |
ResultFullRemoted.log | 09-08-21 | Seyla | 🟡 |
ResultFullRemoted2.log | 09-08-21 | Seyla | 🟡 |
ResultFullRemoted3.log | 09-08-21 | Seyla | 🔴 test_active_response_send_ar.py::test_active_response_ar_sending |
ResultFullRemoted4.log | 09-08-21 | Seyla | 🔴 test_multi_agent_status.py::test_protocols_communication test_basic_configuration_queue_size_valid.py::test_queue_size_valid |
ResultFullRemoted5.log | 09-08-21 | Seyla | 🟡 |
ResultFullRemoted7.log | 09-08-21 | Seyla | 🟡 |
ResultFullRemoted6.log | 09-08-21 | Seyla | 🔴 test_multi_agent_status.py::test_protocols_communication |
ResultFullRemoted8.log | 09-08-21 | Seyla | 🔴 test_active_response_send_ar.py::test_active_response_ar_sending |
ResultFullRemoted9.log | 09-08-21 | Seyla | 🔴 test_multi_agent_status.py::test_protocols_communication |
All the remoted tests were run after the merging the fix in #1692. All of them passed.
Test Executions | Date | By | Status |
---|---|---|---|
test_full_remoted_after_fix_1.log | 2021-08-06 | Matias | :yellow_circle: |
test_full_remoted_after_fix_2.log | 2021-08-06 | Matias | :yellow_circle: |
test_full_remoted_after_fix_3.log | 2021-08-06 | Matias | :yellow_circle: |
test_full_remoted_after_fix_4.log | 2021-08-06 | Matias | :yellow_circle: |
test_full_remoted_after_fix_5.log | 2021-08-06 | Matias | :yellow_circle: |
Description
This issue is part of the https://github.com/wazuh/wazuh-qa/issues/1516. After the research #1493, it has been detected fails and warnings in
remoted
tests for Wazuh manager. In order to sanitize current tests is necessary to fixremoted
tests.Test Execution - Results
Settings
Packages details
Environment
Setup Local_internal_opcion.conf
Development
This fix should be developed in a branch from 4.2, and it must be merged in the epic issue full-green branch
1516-4.2.0-full-green
In order to finish this issue the following tasks should be fulfilled:
remoted
integration tests. :1530-full-yellow-remoted
to1516-4.2.0-full-green