Closed spothound closed 4 years ago
Thanks @grg121 for taking this. Don't hesitate to ask the team anything that you might want, and please use this ticket for tracking purposes. :+1:
Ey, thanks @manuasir, all right, I'll keep you updated!
Working on this issue.
Branch: https://github.com/wazuh/wazuh-qa
<directories realtime="yes" check_all="yes" recursion_level="4">/opt/fim_testing</directories>
I'm running the test within the testing environment and I've detected a problem with the default value of the whodata_check
var on verify_alerts_json.py
and verify_alerts_elasticsearch.py
Fix: https://github.com/wazuh/wazuh-qa/commit/8f7e990f0e9be14054727eac9f772e315d4b26f9
The problem is that "False"
and "false"
default values are interpreted as True
in Python.
{
"root_folder": "{{ agents_fim_testing_path }}",
"recursion_level": 3,
"folder_length": 5,
"file_length": 5,
"file_size_specifications":[
{ "size": 10240, "amount": 10000},
{ "size": 524288, "amount": 1000},
{ "size": 1048576, "amount": 100},
{ "size": 10485760, "amount": 1}
]
}
172.16.0.111 : ok=20 changed=11 unreachable=0 failed=0 skipped=22 rescued=0 ignored=0
172.16.0.131 : ok=15 changed=8 unreachable=0 failed=0 skipped=27 rescued=0 ignored=0
172.16.0.141 : ok=15 changed=9 unreachable=0 failed=0 skipped=27 rescued=0 ignored=0
172.16.0.161 : ok=4 changed=0 unreachable=0 failed=0 skipped=38 rescued=0 ignored=0
All the expected alerts where generated.
Nevertheless, I wanted to know if the alerts where generated actually in realtime so I generated an alert manually by creating the following file:
touch /opt/fim_testing/realtime/$( date +"%T")
The alert generated was:
{
"timestamp": "2020-02-25T23:50:35.419+0000",
"rule": {
"level": 5,
"description": "File added to the system.",
"id": "554",
"firedtimes": 22204,
"mail": false,
"groups": [
"ossec",
"syscheck"
],
"pci_dss": [
"11.5"
],
"gpg13": [
"4.11"
],
"gdpr": [
"II_5.1.f"
],
"hipaa": [
"164.312.c.1",
"164.312.c.2"
],
"nist_800_53": [
"SI.7"
]
},
"agent": {
"id": "002",
"name": "Centos",
"ip": "10.0.2.15"
},
"manager": {
"name": "serv-test-manager-centos-1"
},
"id": "1582674635.481796020",
"full_log": "File '/opt/fim_testing/realtime/23:50:35' added\nMode: real-time\n",
"syscheck": {
"path": "/opt/fim_testing/realtime/23:50:35",
"size_after": "0",
"perm_after": "rw-r--r--",
"uid_after": "0",
"gid_after": "0",
"md5_after": "d41d8cd98f00b204e9800998ecf8427e",
"sha1_after": "da39a3ee5e6b4b0d3255bfef95601890afd80709",
"sha256_after": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"uname_after": "root",
"gname_after": "root",
"mtime_after": "2020-02-25T23:50:35",
"inode_after": 100993634,
"event": "added"
},
"decoder": {
"name": "syscheck_new_entry"
},
"location": "syscheck"
}
Here, mtime_after
coincides with the moment I generated the file (23:50:35
) and the alert timestamp (timestamp
) so it seems that the alert has been generated as soon as the file was added.
Nonetheless, when running the automatic test, the generates alerts have some delay between the moment the file was added/modified and the alert generation.
I decided to try calculating it for each alert generated during the test and plotting it getting the following graph:
*note: just modification alerts
The x-axis represents delay time in seconds between an event and the triggered alert. The y-axis represents the number of alerts with that delay.
I thought maybe the problem was related with syscheck synchronization time, as it was, by default, set to 10 seconds in the agent (notice that realtime is disabled while a syscheck scan is performed). So I increased the time between scans to 5 minutes and launch the second automatic test:
(same config)
{
"root_folder": "{{ agents_fim_testing_path }}",
"recursion_level": 3,
"folder_length": 5,
"file_length": 5,
"file_size_specifications":[
{ "size": 10240, "amount": 10000},
{ "size": 524288, "amount": 1000},
{ "size": 1048576, "amount": 100},
{ "size": 10485760, "amount": 1}
]
}
This time, Windows modified alers failed with 6582 missing alerts
```json
failed: [172.16.0.111] (item={'uid': 1000, 'woth': False, 'mtime': 1582677236.9333484, 'inode': 34315478, 'isgid': False, 'size': 390463, 'roth': True, 'isuid': False, 'isreg': True, 'pw_name': 'vagrant', 'gid': 100, 'ischr': False, 'wusr': True, 'xoth': False, 'rusr': True, 'nlink': 1, 'issock': False, 'rgrp': True, 'gr_name': 'users', 'path': '/opt/agents_files_output/modified/files_modified.txt-172.16.0.141', 'xusr': False, 'atime': 1582677237.841035, 'isdir': False, 'ctime': 1582677237.842035, 'isblk': False, 'xgrp': False, 'dev': 2049, 'wgrp': False, 'isfifo': False, 'mode': '0644', 'islnk': False}) => {"ansible_loop_var": "item", "changed": true, "item": {"atime": 1582677237.841035, "ctime": 1582677237.842035, "dev": 2049, "gid": 100, "gr_name": "users", "inode": 34315478, "isblk": false, "ischr": false, "isdir": false, "isfifo": false, "isgid": false, "islnk": false, "isreg": true, "issock": false, "isuid": false, "mode": "0644", "mtime": 1582677236.9333484, "nlink": 1, "path": "/opt/agents_files_output/modified/files_modified.txt-172.16.0.141", "pw_name": "vagrant", "rgrp": true, "roth": true, "rusr": true, "size": 390463, "uid": 1000, "wgrp": false, "woth": false, "wusr": true, "xgrp": false, "xoth": false, "xusr": false}, "msg": "non-zero return code", "rc": 1, "stderr": "Shared connection to 172.16.0.111 closed.\r\n", "stderr_lines": ["Shared connection to 172.16.0.111 closed."], "stdout": "2020-02-26 00:34:01,890 [INFO] alerts.json verification started\r\n2020-02-26 00:34:02,683 [INFO] Filelist related alerts aren't growing (0) ...\r\n2020-02-26 00:35:02,738 [INFO] Elapsed time: ~ 60 seconds \r\n\r\n2020-02-26 00:35:03,417 [INFO] Filelist related alerts aren't growing (1) ...\r\n2020-02-26 00:36:03,435 [INFO] Elapsed time: ~ 121 seconds \r\n\r\n2020-02-26 00:36:04,023 [INFO] Filelist related alerts aren't growing (2) ...\r\n2020-02-26 00:37:04,056 [INFO] Elapsed time: ~ 182 seconds \r\n\r\n2020-02-26 00:37:04,641 [INFO] Filelist related alerts aren't growing (3) ...\r\n2020-02-26 00:38:04,686 [INFO] Elapsed time: ~ 242 seconds \r\n\r\n2020-02-26 00:38:05,252 [INFO] Filelist related alerts aren't growing (4) ...\r\n2020-02-26 00:39:05,293 [INFO] Elapsed time: ~ 303 seconds \r\n\r\n2020-02-26 00:39:06,192 [WARNING] Verify alerts test - NOT OK. 6582 alerts are missing.\r\n\r\n2020-02-26 00:39:06,195 [WARNING] 6582 missing alerts.\r\n\r\n", "stdout_lines": ["2020-02-26 00:34:01,890 [INFO] alerts.json verification started", "2020-02-26 00:34:02,683 [INFO] Filelist related alerts aren't growing (0) ...", "2020-02-26 00:35:02,738 [INFO] Elapsed time: ~ 60 seconds ", "", "2020-02-26 00:35:03,417 [INFO] Filelist related alerts aren't growing (1) ...", "2020-02-26 00:36:03,435 [INFO] Elapsed time: ~ 121 seconds ", "", "2020-02-26 00:36:04,023 [INFO] Filelist related alerts aren't growing (2) ...", "2020-02-26 00:37:04,056 [INFO] Elapsed time: ~ 182 seconds ", "", "2020-02-26 00:37:04,641 [INFO] Filelist related alerts aren't growing (3) ...", "2020-02-26 00:38:04,686 [INFO] Elapsed time: ~ 242 seconds ", "", "2020-02-26 00:38:05,252 [INFO] Filelist related alerts aren't growing (4) ...", "2020-02-26 00:39:05,293 [INFO] Elapsed time: ~ 303 seconds ", "", "2020-02-26 00:39:06,192 [WARNING] Verify alerts test - NOT OK. 6582 alerts are missing.", "", "2020-02-26 00:39:06,195 [WARNING] 6582 missing alerts.", ""]}
I generated another graph:
PLAY RECAP ***********************************************************************************************************
172.16.0.111 : ok=20 changed=11 unreachable=0 failed=0 skipped=22 rescued=0 ignored=0
172.16.0.131 : ok=15 changed=8 unreachable=0 failed=0 skipped=27 rescued=0 ignored=0
172.16.0.141 : ok=15 changed=8 unreachable=0 failed=0 skipped=27 rescued=0 ignored=0
172.16.0.161 : ok=4 changed=0 unreachable=0 failed=0 skipped=38 rescued=0 ignored=0
All the alerts generated.
Graphics:
Comparing Windows and Linux agent I get to the following conclusions:
mtime_after
and alert timestamp
.(same config)
{
"root_folder": "{{ agents_fim_testing_path }}",
"recursion_level": 8,
"folder_length": 10,
"file_length": 10,
"file_size_specifications":[
{ "size": 10240, "amount": 50000},
{ "size": 524288, "amount": 100},
{ "size": 1048576, "amount": 100},
{ "size": 10485760, "amount": 1}
]
}
With this config, the test fails by the Verify alerts test - NOT OK. 460 alerts are missing.
Nevertheless, if I check the manager it seems that these alerts are still being generated... so I would like to investigate if the test is ending correctly or it would require more time.
As we've noticed, the way this test is performed, it generates some delay in the alert generation due to the overhead. If the test seems to fail when that overhead is big... maybe we need to ensure there are alerts that are being dropped and is not simply a lack of testing time.
failed: [172.16.0.111] (item={'uid': 1000, 'woth': False, 'mtime': 1582681957.5618412, 'inode': 67640362, 'isgid': False, 'size': 2797442, 'roth': True, 'isuid': False, 'isreg': True, 'pw_name': 'vagrant', 'gid': 100, 'ischr': False, 'wusr': True, 'xoth': False, 'rusr': True, 'nlink': 1, 'issock': False, 'rgrp': True, 'gr_name': 'users', 'path': '/opt/agents_files_output/added/files_generated.txt-172.16.0.131', 'xusr': False, 'atime': 1582682225.8554828, 'isdir': False, 'ctime': 1582682225.888483, 'isblk': False, 'xgrp': False, 'dev': 2049, 'wgrp': False, 'isfifo': False, 'mode': '0644', 'islnk': False}) => {"ansible_loop_var": "item", "changed": true, "item": {"atime": 1582682225.8554828, "ctime": 1582682225.888483, "dev": 2049, "gid": 100, "gr_name": "users", "inode": 67640362, "isblk": false, "ischr": false, "isdir": false, "isfifo": false, "isgid": false, "islnk": false, "isreg": true, "issock": false, "isuid": false, "mode": "0644", "mtime": 1582681957.5618412, "nlink": 1, "path": "/opt/agents_files_output/added/files_generated.txt-172.16.0.131", "pw_name": "vagrant", "rgrp": true, "roth": true, "rusr": true, "size": 2797442, "uid": 1000, "wgrp": false, "woth": false, "wusr": true, "xgrp": false, "xoth": false, "xusr": false}, "msg": "non-zero return code", "rc": 1, "stderr": "Shared connection to 172.16.0.111 closed.\r\n", "stderr_lines": ["Shared connection to 172.16.0.111 closed."], "stdout": "2020-02-26 01:57:06,987 [INFO] alerts.json verification started\r\n2020-02-26 01:57:07,058 [INFO] Filelist related alerts aren't growing (0) ...\r\n2020-02-26 01:58:07,113 [INFO] Elapsed time: ~ 60 seconds \r\n\r\n2020-02-26 01:58:07,202 [INFO] Filelist related alerts aren't growing (1) ...\r\n2020-02-26 01:59:07,219 [INFO] Elapsed time: ~ 120 seconds \r\n\r\n2020-02-26 01:59:07,310 [INFO] Filelist related alerts aren't growing (2) ...\r\n2020-02-26 02:00:07,340 [INFO] Elapsed time: ~ 180 seconds \r\n\r\n2020-02-26 02:00:07,400 [INFO] Filelist related alerts aren't growing (3) ...\r\n2020-02-26 02:01:07,456 [INFO] Elapsed time: ~ 240 seconds \r\n\r\n2020-02-26 02:02:07,727 [INFO] Elapsed time: ~ 300 seconds \r\n\r\n2020-02-26 02:03:08,875 [INFO] Elapsed time: ~ 361 seconds \r\n\r\n2020-02-26 02:04:11,003 [INFO] Elapsed time: ~ 424 seconds \r\n\r\n2020-02-26 02:05:13,966 [INFO] Elapsed time: ~ 486 seconds \r\n\r\n2020-02-26 02:06:18,214 [INFO] Elapsed time: ~ 551 seconds \r\n\r\n2020-02-26 02:06:22,383 [INFO] Filelist related alerts aren't growing (0) ...\r\n2020-02-26 02:07:22,457 [INFO] Elapsed time: ~ 615 seconds \r\n\r\n2020-02-26 02:07:26,357 [INFO] Filelist related alerts aren't growing (1) ...\r\n2020-02-26 02:08:26,377 [INFO] Elapsed time: ~ 679 seconds \r\n\r\n2020-02-26 02:08:30,602 [INFO] Filelist related alerts aren't growing (2) ...\r\n2020-02-26 02:09:30,668 [INFO] Elapsed time: ~ 743 seconds \r\n\r\n2020-02-26 02:09:35,510 [INFO] Filelist related alerts aren't growing (3) ...\r\n2020-02-26 02:10:35,554 [INFO] Elapsed time: ~ 808 seconds \r\n\r\n2020-02-26 02:10:44,293 [INFO] Filelist related alerts aren't growing (4) ...\r\n2020-02-26 02:11:44,322 [INFO] Elapsed time: ~ 877 seconds \r\n\r\n2020-02-26 02:11:53,649 [WARNING] Verify alerts test - NOT OK. 460 alerts are missing.\r\n\r\n2020-02-26 02:11:53,682 [WARNING] 460 missing alerts.\r\n\r\n", "stdout_lines": ["2020-02-26 01:57:06,987 [INFO] alerts.json verification started", "2020-02-26 01:57:07,058 [INFO] Filelist related alerts aren't growing (0) ...", "2020-02-26 01:58:07,113 [INFO] Elapsed time: ~ 60 seconds ", "", "2020-02-26 01:58:07,202 [INFO] Filelist related alerts aren't growing (1) ...", "2020-02-26 01:59:07,219 [INFO] Elapsed time: ~ 120 seconds ", "", "2020-02-26 01:59:07,310 [INFO] Filelist related alerts aren't growing (2) ...", "2020-02-26 02:00:07,340 [INFO] Elapsed time: ~ 180 seconds ", "", "2020-02-26 02:00:07,400 [INFO] Filelist related alerts aren't growing (3) ...", "2020-02-26 02:01:07,456 [INFO] Elapsed time: ~ 240 seconds ", "", "2020-02-26 02:02:07,727 [INFO] Elapsed time: ~ 300 seconds ", "", "2020-02-26 02:03:08,875 [INFO] Elapsed time: ~ 361 seconds ", "", "2020-02-26 02:04:11,003 [INFO] Elapsed time: ~ 424 seconds ", "", "2020-02-26 02:05:13,966 [INFO] Elapsed time: ~ 486 seconds ", "", "2020-02-26 02:06:18,214 [INFO] Elapsed time: ~ 551 seconds ", "", "2020-02-26 02:06:22,383 [INFO] Filelist related alerts aren't growing (0) ...", "2020-02-26 02:07:22,457 [INFO] Elapsed time: ~ 615 seconds ", "", "2020-02-26 02:07:26,357 [INFO] Filelist related alerts aren't growing (1) ...", "2020-02-26 02:08:26,377 [INFO] Elapsed time: ~ 679 seconds ", "", "2020-02-26 02:08:30,602 [INFO] Filelist related alerts aren't growing (2) ...", "2020-02-26 02:09:30,668 [INFO] Elapsed time: ~ 743 seconds ", "", "2020-02-26 02:09:35,510 [INFO] Filelist related alerts aren't growing (3) ...", "2020-02-26 02:10:35,554 [INFO] Elapsed time: ~ 808 seconds ", "", "2020-02-26 02:10:44,293 [INFO] Filelist related alerts aren't growing (4) ...", "2020-02-26 02:11:44,322 [INFO] Elapsed time: ~ 877 seconds ", "", "2020-02-26 02:11:53,649 [WARNING] Verify alerts test - NOT OK. 460 alerts are missing.", "", "2020-02-26 02:11:53,682 [WARNING] 460 missing alerts.", ""]}
I would like to investigate if these missing alerts are actually missing or will be generated in the following syscheck simple scan (maybe realtime dropped them but FIM will actually detect these changes).
I will continue with that tomorrow.
Best regards,
Hello team,
I've been talking with Alberto Rodriguez and Jose M about this scenario, we've concluded we will need an additional scenario to "Check overlap of scheduled syscheck scan and realtime scan".
The scenario developed for this issue will be only for realtime (with a long scan frequency, so there are no overlaps) and we want to ensure that files start being generated after the first scan finish when the agent is restarted.
Also, I want to make sure that alerts marked as dropped are actually dropped by increasing the timeout of the test and maybe the replays of the alert checking.
So, to finish this issue, today I will:
Hello team, here the script used to generate these graphics of time delay: realtime_delay.zip
(same config)
{
"root_folder": "{{ agents_fim_testing_path }}",
"recursion_level": 3,
"folder_length": 5,
"file_length": 5,
"file_size_specifications":[
{ "size": 10240, "amount": 10000},
{ "size": 524288, "amount": 1000},
{ "size": 1048576, "amount": 100},
{ "size": 10485760, "amount": 1}
]
}
This test have not passed with the new changes.
3 Failed replays.
A small number of files for realtime
{
"root_folder": "{{ agents_fim_testing_path }}",
"recursion_level": 3,
"folder_length": 5,
"file_length": 5,
"file_size_specifications":[
{ "size": 10240, "amount": 1000},
{ "size": 524288, "amount": 100},
{ "size": 1048576, "amount": 10},
{ "size": 10485760, "amount": 1}
]
}
Fails on windows checking delete alerts
2020-02-27 00:24:57,767 [WARNING] 1110 missing alerts.
Add and modification seems to be right.
A small number of files for realtime
{
"root_folder": "{{ agents_fim_testing_path }}",
"recursion_level": 3,
"folder_length": 5,
"file_length": 5,
"file_size_specifications":[
{ "size": 10240, "amount": 100},
{ "size": 524288, "amount": 100},
{ "size": 1048576, "amount": 10},
{ "size": 10485760, "amount": 1}
]
}
:ok:
PLAY RECAP ***********************************************************************************************************
172.16.0.111 : ok=20 changed=11 unreachable=0 failed=0 skipped=26 rescued=0 ignored=0
172.16.0.131 : ok=17 changed=9 unreachable=0 failed=0 skipped=29 rescued=0 ignored=0
172.16.0.141 : ok=17 changed=9 unreachable=0 failed=0 skipped=29 rescued=0 ignored=0
172.16.0.161 : ok=4 changed=0 unreachable=0 failed=0 skipped=42 rescued=0 ignored=0
With a small number of files everything seems to work (all the alerts are correctly generated). Nevertheless, it seems that the delay in windows is higher than Linux. We'll keep checking that.
A small number of files for realtime
{
"root_folder": "{{ agents_fim_testing_path }}",
"recursion_level": 3,
"folder_length": 5,
"file_length": 5,
"file_size_specifications":[
{ "size": 10240, "amount": 10000},
{ "size": 524288, "amount": 1000},
{ "size": 1048576, "amount": 100},
{ "size": 10485760, "amount": 1}
]
}
I've detected this test fills the realtime queue, the following message appear in the ossec.log:
WARNING: Real-time inotify kernel queue is full. Some events may be lost. Next scheduled scan will recover lost data.
New aproach: do not use a "infinite" frequency for the scan, instead use 5 mintues, so realtime works for 5 minutes and then a schecheduled scan fit the "lost" alerts.
We have decided the following for this scenario:
After making these changes:
elastic_verification_extra_arguments: "-r 10"
alert_verification_extra_arguments: "-r 10"
create_script_extra_params: "-b 500 -w 5"
delete_script_extra_params: "-b 500 -w 5"
modify_script_extra_params: "-b 500 -w 5"
clean_logs: true
{
"root_folder": "{{ agents_fim_testing_path }}",
"recursion_level": 3,
"folder_length": 5,
"file_length": 5,
"file_size_specifications":[
{ "size": 10240, "amount": 10000},
{ "size": 524288, "amount": 1000},
{ "size": 1048576, "amount": 100},
{ "size": 10485760, "amount": 1}
]
}
172.16.0.111 : ok=18 changed=9 unreachable=0 failed=1 skipped=26 rescued=0 ignored=0
172.16.0.131 : ok=17 changed=9 unreachable=0 failed=0 skipped=29 rescued=0 ignored=0
172.16.0.141 : ok=17 changed=9 unreachable=0 failed=0 skipped=29 rescued=0 ignored=0
172.16.0.161 : ok=4 changed=0 unreachable=0 failed=0 skipped=42 rescued=0 ignored=0
Failed only in the delete step... Windows queue overflow!
Hello team,
This issue is related to #493. It will contain the advances related to the creation of the scenario 0202
Objectives
Implement scenario 202.
Tasks
ossec.conf
fileslaunch_scenario.yml
alerts.json
.Elasticsearch
.Regards, Francisco Navarro.