scylladb / scylla-ccm

Cassandra Cluster Manager, modified for Scylla
Apache License 2.0
20 stars 64 forks source link

"waiting for scylla process to be running" log spam #465

Closed kbr-scylla closed 1 year ago

kbr-scylla commented 1 year ago

Huge spam in the logs when I run dtest locally:

15:50:51,771 ccm             DEBUG cluster.py     :709  | node3: Starting scylla-jmx: args=['/home/kbraun/.dtest/dtest-8v67tmp7/test/node3/bin/symlinks/scylla-jmx', '-Dapiaddress=127.0.41.3', '-Djavax.management.builder.initial=com.scylladb.jmx.utils.API
Builder', '-Djava.rmi.server.hostname=127.0.41.3', '-Dcom.sun.management.jmxremote', '-Dcom.sun.management.jmxremote.host=127.0.41.3', '-Dcom.sun.management.jmxremote.port=7199', '-Dcom.sun.management.jmxremote.rmi.port=7199', '-Dcom.sun.management.jmxre
mote.local.only=false', '-Xmx256m', '-XX:+UseSerialGC', '-Dcom.sun.management.jmxremote.authenticate=false', '-Dcom.sun.management.jmxremote.ssl=false', '-jar', '/home/kbraun/.dtest/dtest-8v67tmp7/test/node3/bin/scylla-jmx-1.0.jar']
Waiting for scylla process to be running (0.000007 secs)                                                                                                                                                                                                      
Waiting for scylla process to be running (0.010086 secs)                                                                                                                                                                                                      
Waiting for scylla process to be running (0.020234 secs)                                                                                                                                                                                                      
Waiting for scylla process to be running (0.030353 secs)                                                                       
Waiting for scylla process to be running (0.040477 secs)                                                                       
Waiting for scylla process to be running (0.050609 secs)                                                                                                                                                                                                      
Waiting for scylla process to be running (0.060743 secs)                                                                                                                                                                                                      
Waiting for scylla process to be running (0.070882 secs)                                                                                                                                                                                                      
Waiting for scylla process to be running (0.081018 secs)                                                                       
Waiting for scylla process to be running (0.091158 secs)                                                                                                                                                                                                      
Waiting for scylla process to be running (0.101272 secs)                                                                                                                                                                                                      
Waiting for scylla process to be running (0.111388 secs)                                                                                                                                                                                                      
Waiting for scylla process to be running (0.121505 secs)       
Waiting for scylla process to be running (0.131624 secs)       
Waiting for scylla process to be running (0.141749 secs)       
...
Waiting for scylla process to be running (9.469546 secs)
Waiting for scylla process to be running (9.479740 secs)
Waiting for scylla process to be running (9.489927 secs)
Waiting for scylla process to be running (9.500126 secs)
Waiting for scylla process to be running (9.510278 secs)
Waiting for scylla process to be running (9.520452 secs)
Waiting for scylla process to be running (9.530635 secs)
Waiting for scylla process to be running (9.540848 secs)
Waiting for scylla process to be running (9.551036 secs)
Waiting for scylla process to be running (9.561151 secs)
Waiting for scylla process to be running (9.571293 secs)
Waiting for scylla process to be running (9.581511 secs)
Waiting for scylla process to be running (9.591712 secs)
Waiting for scylla process to be running (9.601909 secs)
Waiting for scylla process to be running (9.612158 secs)
Waiting for scylla process to be running (9.622300 secs)
Waiting for scylla process to be running (9.632500 secs)
Waiting for scylla process to be running (9.642696 secs)
Waiting for scylla process to be running (9.652897 secs)
Waiting for scylla process to be running (9.663088 secs)
Waiting for scylla process to be running (9.673236 secs)
Waiting for scylla process to be running (9.683401 secs)
Waiting for scylla process to be running (9.693611 secs)
Waiting for scylla process to be running (9.703797 secs)
Waiting for scylla process to be running (9.713984 secs)
Waiting for scylla process to be running (9.724197 secs)
Waiting for scylla process to be running (9.734364 secs)
Waiting for scylla process to be running (9.744531 secs)
Waiting for scylla process to be running (9.754738 secs)
Waiting for scylla process to be running (9.764983 secs)
Waiting for scylla process to be running (9.775162 secs)
Waiting for scylla process to be running (9.785307 secs)
Waiting for scylla process to be running (9.795505 secs)
Waiting for scylla process to be running (9.805701 secs)
Waiting for scylla process to be running (9.815892 secs)
Waiting for scylla process to be running (9.826102 secs)
Waiting for scylla process to be running (9.836244 secs)
Waiting for scylla process to be running (9.846428 secs)
Waiting for scylla process to be running (9.856627 secs)
Waiting for scylla process to be running (9.866797 secs)
Waiting for scylla process to be running (9.876968 secs)
Waiting for scylla process to be running (9.887133 secs)
Waiting for scylla process to be running (9.897270 secs)
Waiting for scylla process to be running (9.907439 secs)
Waiting for scylla process to be running (9.917604 secs)
Waiting for scylla process to be running (9.927780 secs)
Waiting for scylla process to be running (9.937997 secs)
Waiting for scylla process to be running (9.948132 secs)
Waiting for scylla process to be running (9.958233 secs)
Waiting for scylla process to be running (9.968340 secs)
Waiting for scylla process to be running (9.978563 secs)
Waiting for scylla process to be running (9.988726 secs)
Waiting for scylla process to be running (9.998947 secs)

is it really necessary to print this every 10 milliseconds? I think this should be rate limited to ~once every 5 seconds or even less frequently, and maybe not printed in the main console log.

mykaul commented 1 year ago

@nyh - could this be due to https://github.com/scylladb/scylladb/commit/0e602159b9324997ca876f61b98a9276cebe0e1e ?

kbr-scylla commented 1 year ago

https://github.com/scylladb/scylladb/commit/0e602159b9324997ca876f61b98a9276cebe0e1e could only improve startup times.

The log message in CCM itself was introduced recently. commit 0184a10183567dac1bb5b8b16b2bb941f22fb560

Author: Israel Fruchter <fruch@scylladb.com>
Date:   Thu May 25 12:12:01 2023 +0300

    ccmlib/scylla_node: wait for scylla process to be running

    seem like there was a call to `node.is_running()` that
    wasn't checking the output, and assuming it would update the pid
    information, and wasn't always the case.

    this was causing cases that the next node would identifed
    as running and not considered as part of the `wait_other_notice`
    check.

we just need to polish it a bit (add rate-limiting etc.)

nyh commented 1 year ago

@nyh - could this be due to scylladb/scylladb@0e60215 ?

I don't understand how - look at that code, it doesn't print anything between each iteration.

It seems more reasonable that we have some recently-changed code in dtest or ccm that prints this silly message. Try grepping this string in ccm or dtest.

nyh commented 1 year ago

The log message in CCM itself was introduced recently. commit 0184a10

Right, this makes more sense.

we just need to polish it a bit (add rate-limiting etc.)

Or just print the message once (or once when starting to wait plus when the wait is done). Printing such a message every 0.1 seconds or every 1 seconds is both silly.

fruch commented 1 year ago

wait_for, shouldn't be printing that excessively, or printing to stdout at all