Closed gdiazlo closed 2 years ago
I tried to replicate the reported problem that it states the plugin platform crashed due to the modulesd
Wazuh module not working and it causes some API request fails for the statistics
job of Wazuh app.
@asteriscos could replicate the problem and we were doing peer to locate and analyze the unhandled promise.
The problem was a supposed logic to build a chain of promises and resolves it with an await
. The resolution of the chain of promises was in a try/catch
block and it is expected the catch
block captures if some promise of the chain was rejected. This is not working as expected due to the method to build the chain of promises was correct.
The try/catch
block: https://github.com/wazuh/wazuh-kibana-app/blob/4.3-7.10/server/start/cron-scheduler/scheduler-job.ts#L31-L46
try {
const hosts = await this.getApiObjects();
const data = await hosts.reduce(async (acc: Promise<object[]>, host) => {
const { status } = configuredJobs({ host, jobName: this.jobName })[this.jobName];
if (!status) return acc;
const response = await this.getResponses(host);
const accResolve = await Promise.resolve(acc)
return [
...accResolve,
...response,
];
}, Promise.resolve([]));
!!data.length && await this.saveDocument.save(data, index);
} catch (error) {
ErrorHandler(error, this.logger);
}
The supposed chain of promises was generated with Array.prototype.reduce
, but the reduce function should return directly a promise to the chain of promises is built correctly and can resolve the chain with the await
. Instead of some async actions are done and these are not managed causes an error that the catch
could capture.
@asteriscos suggested replacing the chain of promises with running them in parallel because: + it is faster than resolving each one synchronously. + we don't see the requirement to run them synchronously. - it increases the number of requests in an interval of time to the API, affecting to the API request limit.
After forcing an invalid state in a custom decoder, I was able to replicate the issue. We traced the exception to promises rejected when trying to execute cron-jobs.
After analyzing the error handling of reducing the promises, we decided a simultaneous execution of the promises would be faster and more stable. In case one of the promises fails it would not affect the others, thus we took a Promise.all
approach to solve the issue.
I was researching the configuration of NodeJS with the --unhandled-rejections=warn
flag in the node.options
file that is interpreted by the bin/opensearch-dashboards
executable script for the Wazuh dashboard package. The flag is added to the NODE_OPTIONS
variable that NodeJS
should take into account.
I edited the server-side code of the Wazuh plugin to cause an unhandled promise and restart the server.
// Added to the `start` function
// /usr/share/wazuh-dashboard/plugins/wazuh/server/plugin.js
// Reject the promise in 5 seconds
(async () => new Promise((res,rej) => setTimeout(rej,5000)))()
I tested without and with the --unhandled-rejections=warn
flag and the server crashed anyways due to the unhanded promise.
Taking into account the log of the unhandled rejection:
Apr 19 10:06:43 localhost.localdomain opensearch-dashboards[8420]: Unhandled Promise rejection detected:
Apr 19 10:06:43 localhost.localdomain opensearch-dashboards[8420]: undefined
Apr 19 10:06:43 localhost.localdomain opensearch-dashboards[8420]: Terminating process...
I searched it in the OpenSearch Dashboards production code and I found a file where it defines the logging of the unhandled promise and exits the process. https://github.com/opensearch-project/OpenSearch-Dashboards/blob/1.2.0/src/setup_node_env/exit_on_warning.js#L61. There is a handler for unhandledRejection
event that contains the logic to exist the node app when one is caught. There is another process handler that could exist the node app. Both process handlers are added when the variable process.noProcessWarnings
is not true
. https://github.com/opensearch-project/OpenSearch-Dashboards/blob/1.2.0/src/setup_node_env/exit_on_warning.js#L33-L63
The --no-warnings
flag should set the process.noProcessWarnings
variable to true
and the mentioned code block should be ignored. Despite the --no-warnings
flag is included in the NODE_OPTIONS
environment variable, by some reason I don't know, NodeJS
doesn't set the process.noProcessWarnings
to true
and as a result, the process handlers are added. I am not sure if the other NodeJS options are not applicating as expected.
The OpenSearch Dashboards app is initialized with:
NODE_OPTIONS="--no-warnings --max-http-header-size=65536 $OSD_NODE_OPTS $NODE_OPTIONS" NODE_ENV=production exec "${NODE}" "${DIR}/src/cli/dist" ${@}
I don't know the reason, but in my testings, if the --no-warnings
NodeJS option is passed as an option to the binary, this seems to set the process.noProcessWarnigns
and as a result ignore to set the 2 process handlers (warning
and unhandledRejection
). Something like:
NODE_OPTIONS="--no-warnings --max-http-header-size=65536 $OSD_NODE_OPTS $NODE_OPTIONS" NODE_ENV=production exec "${NODE}" "--no-warnings" "${DIR}/src/cli/dist" ${@}
bin/opensearch-dashboards
script:
#!/bin/sh
SCRIPT=$0
while [ -h "$SCRIPT" ] ; do ls=$(ls -ld "$SCRIPT")
link=$(expr "$ls" : '.-> (.)$') if expr "$link" : '/.*' > /dev/null; then SCRIPT="$link" else SCRIPT=$(dirname "$SCRIPT")/"$link" fi done
DIR="$(dirname "${SCRIPT}")/.." CONFIG_DIR=${OSD_PATH_CONF:-"/etc/wazuh-dashboard"}
if [ -x "${DIR}/node/bin/node" ]; then NODE="${DIR}/node/bin/node" else NODE="$(which node)" fi
if [ ! -x "$NODE" ]; then echo "unable to find usable node.js executable." exit 1 fi
if [ -f "${CONFIG_DIR}/node.options" ]; then OSD_NODE_OPTS="$(grep -v ^# < ${CONFIG_DIR}/node.options | xargs)" fi
NODE_OPTIONS="--no-warnings --max-http-header-size=65536 $OSD_NODE_OPTS $NODE_OPTIONS" NODE_ENV=production exec "${NODE}" "${DIR}/src/cli/dist" ${@}
## Other tests:
### Wazuh dashboard Docker image :heavy_check_mark:
I added the same unhandled promise to the Wazuh server-side code and the server doesn't crash due to it. I checked the value of `process.noProcessWarnings` and it is set to `true` due to the `--no-warnings` option is evaluated correctly.
- Log of the unhandled rejection
wz_indexer-wz_cluster-agent-wazuh-dashboard-1 | {"type":"log","@timestamp":"2022-04-19T10:51:04Z","tags":["warning","environment"],"pid":9,"message":"Detected an unhandled Promise rejection.\nundefined"}
- `bin/opensearch-dashboards` script:
```sh
#!/bin/sh
SCRIPT=$0
# SCRIPT may be an arbitrarily deep series of symlinks. Loop until we have the concrete path.
while [ -h "$SCRIPT" ] ; do
ls=$(ls -ld "$SCRIPT")
# Drop everything prior to ->
link=$(expr "$ls" : '.*-> \(.*\)$')
if expr "$link" : '/.*' > /dev/null; then
SCRIPT="$link"
else
SCRIPT=$(dirname "$SCRIPT")/"$link"
fi
done
DIR="$(dirname "${SCRIPT}")/.."
CONFIG_DIR=${OSD_PATH_CONF:-"$DIR/config"}
if [ -x "${DIR}/node/bin/node" ]; then
NODE="${DIR}/node/bin/node"
else
NODE="$(which node)"
fi
if [ ! -x "$NODE" ]; then
echo "unable to find usable node.js executable."
exit 1
fi
if [ -f "${CONFIG_DIR}/node.options" ]; then
OSD_NODE_OPTS="$(grep -v ^# < ${CONFIG_DIR}/node.options | xargs)"
fi
NODE_OPTIONS="--no-warnings --max-http-header-size=65536 $OSD_NODE_OPTS $NODE_OPTIONS" NODE_ENV=production exec "${NODE}" "${DIR}/src/cli/dist" ${@}
I added the same unhandled promise to the Wazuh server-side code and the server doesn't crash. I checked the value of process.noProcessWarnings
and it is set to true
due to the --no-warnings
option is being evaluated correctly.
bin/kibana
script:
#!/bin/sh
SCRIPT=$0
while [ -h "$SCRIPT" ] ; do ls=$(ls -ld "$SCRIPT")
link=$(expr "$ls" : '.-> (.)$') if expr "$link" : '/.*' > /dev/null; then SCRIPT="$link" else SCRIPT=$(dirname "$SCRIPT")/"$link" fi done
DIR="$(dirname "${SCRIPT}")/.." CONFIG_DIR=${KBN_PATH_CONF:-"$DIR/config"} NODE="${DIR}/node/bin/node" test -x "$NODE" if [ ! -x "$NODE" ]; then echo "unable to find usable node.js executable." exit 1 fi
if [ -f "${CONFIG_DIR}/node.options" ]; then KBN_NODE_OPTS="$(grep -v ^# < ${CONFIG_DIR}/node.options | xargs)" fi
NODE_OPTIONS="--no-warnings --max-http-header-size=65536 $KBN_NODE_OPTS $NODE_OPTIONS" NODE_ENV=production exec "${NODE}" "${DIR}/src/cli/dist" ${@}
# Conclustion
It seems there is some problem in the package or environment of the Wazuh dashboard, that causes the options set in the `NODE_OPTIONS` environment variable are not evaluated as expected.
I was trying with a Kibana in production installed in a VM using the all-in-one deployment script and if provokes an unhandled promise, the process exists.
Apr 19 13:11:04 localhost.localdomain kibana[6475]: Unhandled Promise rejection detected:
Apr 19 13:11:04 localhost.localdomain kibana[6475]: undefined
Apr 19 13:11:04 localhost.localdomain kibana[6475]: Terminating process...
But the server restarts due to the Restart
setting in the service definition file:
# Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# or in the "license" file accompanying this file. This file is distributed
# on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
# express or implied. See the License for the specific language governing
# permissions and limitations under the License.
# Description:
# Default kibana.service file
[Unit]
Description=Kibana
[Service]
Type=simple
User=kibana
Group=kibana
# Load env vars from /etc/default/ and /etc/sysconfig/ if they exist.
# Prefixing the path with '-' makes it try to load, but if the file doesn't
# exist, it continues onward.
EnvironmentFile=-/etc/default/kibana
EnvironmentFile=-/etc/sysconfig/kibana
ExecStart=/usr/share/kibana/bin/kibana "-c /etc/kibana/kibana.yml"
Restart=always
WorkingDirectory=/
[Install]
WantedBy=multi-user.target
The service definition file for Wazuh dashboard has no the Restart
setting. We could be interested in to adding it.
Apparently, the problem is that despite the environment variables being loaded, node js does not recognize them, the only way I found to make it work at the moment is:
NODE_OPTIONS="--no-warnings --max-http-header-size=65536 $OSD_NODE_OPTS $NODE_OPTIONS"
NODE_ENV=production "${NODE}" ${NODE_OPTIONS} "${DIR}/src/cli/dist" ${@}
This is the log with this change:
-- The start-up result is done.
Apr 26 18:53:32 centos7-1 opensearch-dashboards[30397]: { NODE_ENV: 'production',
Apr 26 18:53:32 centos7-1 opensearch-dashboards[30397]: KILL_ON_STOP_TIMEOUT: '0',
Apr 26 18:53:32 centos7-1 opensearch-dashboards[30397]: chroot: '/',
Apr 26 18:53:32 centos7-1 opensearch-dashboards[30397]: SHELL: '/sbin/nologin',
Apr 26 18:53:32 centos7-1 opensearch-dashboards[30397]: nice: '',
Apr 26 18:53:32 centos7-1 opensearch-dashboards[30397]: user: 'wazuh-dashboard',
Apr 26 18:53:32 centos7-1 opensearch-dashboards[30397]: USER: 'wazuh-dashboard',
Apr 26 18:53:32 centos7-1 opensearch-dashboards[30397]: PATH: '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin',
Apr 26 18:53:32 centos7-1 opensearch-dashboards[30397]: PWD: '/usr/share/wazuh-dashboard',
Apr 26 18:53:32 centos7-1 opensearch-dashboards[30397]: LANG: 'en_US.UTF-8',
Apr 26 18:53:32 centos7-1 opensearch-dashboards[30397]: SHLVL: '1',
Apr 26 18:53:32 centos7-1 opensearch-dashboards[30397]: HOME: '/usr/share/wazuh-dashboard/',
Apr 26 18:53:32 centos7-1 opensearch-dashboards[30397]: group: 'wazuh-dashboard',
Apr 26 18:53:32 centos7-1 opensearch-dashboards[30397]: LOGNAME: 'wazuh-dashboard',
Apr 26 18:53:32 centos7-1 opensearch-dashboards[30397]: chdir: '/',
Apr 26 18:53:32 centos7-1 opensearch-dashboards[30397]: _: '/usr/share/wazuh-dashboard/bin/../node/bin/node' }
Apr 26 18:53:32 centos7-1 opensearch-dashboards[30397]: true
This is the log without this change:
Apr 26 18:49:53 centos7-1 opensearch-dashboards[30338]: { NODE_ENV: 'production',
Apr 26 18:49:53 centos7-1 opensearch-dashboards[30338]: NODE_OPTIONS:
Apr 26 18:49:53 centos7-1 opensearch-dashboards[30338]: '--no-warnings --max-http-header-size=65536 --unhandled-rejections=warn ',
Apr 26 18:49:53 centos7-1 opensearch-dashboards[30338]: KILL_ON_STOP_TIMEOUT: '0',
Apr 26 18:49:53 centos7-1 opensearch-dashboards[30338]: chroot: '/',
Apr 26 18:49:53 centos7-1 opensearch-dashboards[30338]: SHELL: '/sbin/nologin',
Apr 26 18:49:53 centos7-1 opensearch-dashboards[30338]: nice: '',
Apr 26 18:49:53 centos7-1 opensearch-dashboards[30338]: user: 'wazuh-dashboard',
Apr 26 18:49:53 centos7-1 opensearch-dashboards[30338]: USER: 'wazuh-dashboard',
Apr 26 18:49:53 centos7-1 opensearch-dashboards[30338]: PATH: '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin',
Apr 26 18:49:53 centos7-1 opensearch-dashboards[30338]: PWD: '/usr/share/wazuh-dashboard',
Apr 26 18:49:53 centos7-1 opensearch-dashboards[30338]: LANG: 'en_US.UTF-8',
Apr 26 18:49:53 centos7-1 opensearch-dashboards[30338]: SHLVL: '1',
Apr 26 18:49:53 centos7-1 opensearch-dashboards[30338]: HOME: '/usr/share/wazuh-dashboard/',
Apr 26 18:49:53 centos7-1 opensearch-dashboards[30338]: group: 'wazuh-dashboard',
Apr 26 18:49:53 centos7-1 opensearch-dashboards[30338]: LOGNAME: 'wazuh-dashboard',
Apr 26 18:49:53 centos7-1 opensearch-dashboards[30338]: chdir: '/',
Apr 26 18:49:53 centos7-1 opensearch-dashboards[30338]: _: '/usr/share/wazuh-dashboard/bin/../node/bin/node' }
Apr 26 18:49:53 centos7-1 opensearch-dashboards[30338]: undefined
Description
We have discovered thanks to https://github.com/elastic/kibana/issues/118536 that we have an unhandled promise when a request to the Wazuh API fails because of a connection error. This can make the process crash depending on the configuration of the node.config as stated here https://github.com/elastic/kibana/blob/main/config/node.options#L9
That configuration is disabled by default in opensearch fork.
Steps to reproduce
Expected Result
Actual Result
We need to fix all unhandled promises we find.