Endless loop in ServerMode

DarkC35 commented 2 years ago

Hi, i tried to run the rep-benchmark.sh (that uses cli-benchmark.sh internally) and saw that some configs in the cli script needed to be changed to run it on my two VMs (client, server). Therefore I changed them to use the correct PATH values but still ended up with some questions as the shell scripts seem to be rather outdated:

The DB_DATA_PATH does not exists in the config.properties and Config.java files anymore.
Some grep calls need to be updated as they confict with ANOTHER_* values (the two i found were grep "DB_SWITCH" that should use grep "^DB_SWITCH" and grep "HOST" to grep "^HOST" to only return the non-another values).
The ServerMode.java run method doesn't terminate when a log_stop_flag is found since #167. Therfore I think this mode ends up in an endless loop since no other break condition exists. (And since the cli script starts the server mode as a background task it's not obvious that some processes are still running.)

A quick search in the code reveals that log_stop_flag and DB_DATA_PATH are never used in the Java code. (But are used in most of the shell scripts.)

I wanted to ask if this is a bug or if there is an un-documented way to run the benchmark in a client-server setup (where the server only has the database and serverMode running and the client the actual benchmark)?

SpriCoder commented 2 years ago

Oh, I get it! It a bug because we remove many config parameters in the refactor of benchmark but not update some shell scripts. There are surely no way to stop serveMode except killing benchmark, so would have any suggestion about when to stop benchmark serveMode? As for your question, I think you can firstly start database, then start benchmark in serverMode in the server of database. And then start benchmark to write or query data into database. Finally, you can kill benchmark in serverMode.

DarkC35 commented 2 years ago

I'm no expert but i would suggest to refactor the shell scripts to remove all un-used code/configs and look if anything else needs to be updated after the "new" refactor with the multiple maven modules. As for the rep-benchmark.sh problem: Currently this will start a ServerMode process for each line in the routine file (so 10 test lines will result in 10 repetitve ServerMode process I guess since they are never terminated). To solve this two possible solutions come into my mind:

Revert to the old log_stop_flag break condition (probably not the best way since it involves files and requires a configurable path for this file...)
Return the PID of the started ServerMode in the ssh call and kill it afterwards. This way the lines 49 and 73 in cli-benchmark.sh would look something like this: PID=$(ssh $SERVER_HOST "sh $REMOTE_BENCHMARK_HOME/ser-benchmark.sh > /dev/null 2>&1 & echo \$!") and this PID can be used to stop the process in lines 66 and 92 with something like: ssh $SERVER_HOST "kill $PID"

Another small suggestion for the cli-benchmark.sh file would be to remove the ssh $HOST_NAME@127.0.0.1 part in the lines 39, 41, and 42 as I don't see the reason to execute this 3 calls with ssh on localhost and in my case those did not work since my ssh port is not the default 22.

SpriCoder commented 2 years ago

Ok, Thanks for your suggestion! we will fix this bug later.

DarkC35 commented 2 years ago

Thank you! Please let me know if I can assist you somehow.

SpriCoder commented 2 years ago

Oh, thanks! In my plan, I will firstly check all scripts. After checking, I will comment the usage of each script and then try to fix this problem(maybe use your second suggestion). But before all of that, please let me know whether you are pleasure to fix this problem?

DarkC35 commented 2 years ago

I'm not sure if I'm the right person to refactor all of the scripts as I still don't understand the purpose of some of the scripts (e.g. ser_cli-benchmark, benchmark_consumer, ...), some of the configs like DB_DATA_PATH and clientSystemInfo.properties, the special condition for IoTDB in cli-benchmark.sh, and if the ANOTHER_HOST needs to be considered for system monitoring as well.

SpriCoder commented 2 years ago

Ok, to be honest, I am not very familiar to these scripts. So I will first check these scripts and then decide to how to do.

SpriCoder commented 2 years ago

I have already checked all the script. The features of scripts contains group test, server mode start and their combination. So we will check whether to maintain these scripts.

DarkC35 commented 2 years ago

Just tried to tidy up the script with the suggested PID variant (consider it a WIP as it does not work for the IoT special case):

if [ -z "${BENCHMARK_HOME}" ]; then
  export BENCHMARK_HOME="$(cd "`dirname "$0"`"/.; pwd)"
fi
#IMPORTANT: to use this script, make sure you have password-free ssh access to 127.0.0.1 and the server of database service
# if not you need to use the following command:
# ssh-keygen -t rsa
# ssh-copy-id [hostname]@[IP]

#configure the related path and host name
IOTDB_HOME=/home/liurui/github/iotdb/iotdb/bin
REMOTE_BENCHMARK_HOME=/home/admin/git/iotdb-benchmark/influxdb/target/influxdb-0.0.1
USER_NAME=admin
SSH_PORT=22

#extract parameters from config.properties
IP=$(grep "^HOST" $BENCHMARK_HOME/conf/config.properties)
BENCHMARK_WORK_MODE=$(grep "BENCHMARK_WORK_MODE" $BENCHMARK_HOME/conf/config.properties)
DB=$(grep "^DB_SWITCH" $BENCHMARK_HOME/conf/config.properties)
SERVER_HOST=$USER_NAME@${IP#*=}

echo Testing ${DB#*=} ...

#If IP is localhost not trigger off server mode
if [ "${IP#*=}" != "127.0.0.1" ]; then
    ssh -p $SSH_PORT $SERVER_HOST "rm $REMOTE_BENCHMARK_HOME/conf/config.properties"
    scp -P $SSH_PORT $BENCHMARK_HOME/conf/config.properties $SERVER_HOST:$REMOTE_BENCHMARK_HOME/conf
    echo Start server system information recording ...
    SERVER_PID=$(ssh -p $SSH_PORT $SERVER_HOST "sh $REMOTE_BENCHMARK_HOME/ser-benchmark.sh > /dev/null 2>&1 & echo \$!")
    echo ServerMode started on $IP with PID: $SERVER_PID
fi

# TODO:
# if [ "${DB#*=}" = "IoTDB" -a "${BENCHMARK_WORK_MODE#*=}" = "insertTestWithDefaultPath" ]; then
#     COMMIT_ID=$(ssh -p $SSH_PORT $SERVER_HOST "cd $DB_DATA_PATH;git rev-parse HEAD")
#     sed -i "s/^VERSION.*$/VERSION=${COMMIT_ID}/g" $BENCHMARK_HOME/conf/config.properties
#     echo "initial database in server..."
#     ssh -p $SSH_PORT $SERVER_HOST "cd $DB_DATA_PATH;rm -rf data;sh $IOTDB_HOME/stop-server.sh;sleep 2"
#     ssh -p $SSH_PORT $SERVER_HOST "cd $DB_DATA_PATH;sh $IOTDB_HOME/start-server.sh > /dev/null 2>&1 &"
#     echo 'wait a few seconds for lauching IoTDB...'
#     sleep 20
# fi
echo '------Client Test Begin Time------'
date
cd bin
sh startup.sh -cf ../conf/config.properties
wait
if [ "${IP#*=}" != "127.0.0.1" ]; then
    echo Stop server system information recording ...
    ssh -p $SSH_PORT $SERVER_HOST "kill $SERVER_PID"
fi

echo '------Client Test Complete Time------'
date

I removed the if condition for IS_SSH_CHANGE_PORT and always use the -p option for ssh/scp and default the value to 22 to remove the nearly duplicated code.

This seemed to work when I tested it but it resulted in a new problem: The ser-benchmark.sh starts a new sub-process startup.sh that will not be terminated with kill $SERVER_PID. To remove this problem I had to use exec in the startup.sh (line 18) like this: exec $BENCHMARK_HOME/bin/startup.sh -cf $BENCHMARK_HOME/conf/config.properties.

With this small changes I managed to run it with server monitoring.

So we will check whether to maintain these scripts.

I think the rep-benchmark.sh (and therefore the used cli-benchmark and ser-benchmark) is important to run and evaluate the benchmark as a new user since it comes with the routine file that allows to easily run it on databases with different configurations.

SpriCoder commented 2 years ago

Ok, I get it! It's a good idea. I will take your suggestion into consideration. If I create a pull request, I will linked this issue to let you know.

SpriCoder commented 2 years ago

I fixed this problem on #267 , I removed ser-benchmark.sh, cli-benchmark.sh, ser_cli-benchmark.sh and dae-benchmark.sh, because from my perspective, there is no need to use these shell script in community. Server Mode of Benchmark is a mode that monitors the status of the machine and benchmark has its statistics module in it's core so there is no need to use Server Mode for new user of Benchmark. As for new user, I think you can first start iotdb and then use benchmark.sh to start benchmark. Besides, I updated rep-benchmark.sh and give a simple example of routine file which allow user to use benchmark in batch mode. That's my work in this PR, Looking forward to your review and suggestion!

DarkC35 commented 2 years ago

Thank you for your work and time! I looked at the PR and think the removal of all the other scripts feels a bit "harsh". Before commenting ~20 times "do not remove benchmark.bat" in each assembly file I'm going to leave my two comments here:

I guess the benchmark.bat file was mistakenly removed from the assembly files (as the Windows support feature is still rather new), it seems that the file block was missing in some assemblies like iginx. So I think the bat file should not be removed?
The readme needs to be changed since it mentions cli-benchmark.sh and dae-benchmark.sh in section 9.2. The code block in 9.2. that calls the ./dae-benchmark.sh could be replaced with the content of the deleted file ./rep-benchmark.sh > /dev/null 2>&1 & to show the inteded way how to run it as a daemon.

Finally I want to clarify that for a new user that wants to use the benchmark with an external server/database with server monitoring you suggest that the user should come up with an own script for this specific setting? In my use case I do already have some scripts which start/stop the databases via docker on the server so I would have to add a way to start/stop the ServerMode in these scripts too. I already got this somehow working with the TSDBs I want to test (IoTDB, InfluxDB1, InfluxDB2, TimescaleDB) with 40 lines routine files for each DB. Only the mapping of the ServerMode results to the individual benchmark "run" for a better evaluation is a bit tricky since it requires to map the recordTimes(stored as string) of each "run" to the id(stored as bigint) of the ServerMode results. Just curious since this seems like a process that could be automated in your team internally too to test new features/bug-fixes easily on all your supported TSDBs.

SpriCoder commented 2 years ago

Thanks for your review and suggestion! There are some mistakes in my PR and I will fix it. Besides, I understand your use case. In my team, we use zabbix or other external tools to monitor more specific. And I know why you need to start ServerMode at the same time in your use case, I think it's a good idea to provide scripts which can start a benchmark to test and a benchmark to monitor(both for single use and batch use), so I will add these scripts based on all the discussion above.

SpriCoder commented 2 years ago

I have fixed this problem on #267 , add cli-benchmark(which to init iotdb, test benchmark, monitor benchmark) and ser-benchmark(which to start monitor benchmark), PTAL! Looking forward to your suggestion.

thulab / iot-benchmark

Endless loop in ServerMode #261