Closed tylarb closed 3 years ago
Line 42332 goes after 42330 and belongs to the cqlsh
check.
After this the redis cli check is started. Here we have slightly mixed lines due to simultaneous output to log from two different threads.
But it looks slightly strange if we are talking about a universe with redis API disabled - as we have a direct check for such setting:
if c.enable_yedis:
coordinator.add_check(checker, "check_redis_cli")
Need more logs - in logs above we have a full list of health-check parameters passed to the script.
Additionally in logs near the exception usually we can find a string caused the problem (incorrect answer on our cqlsh
command).
Longer logs:
1 2021-01-29 11:36:54,107 [INFO] from com.yugabyte.yw.commissioner.HealthChecker in application-akka.actor.default-dispatcher-2343 - Doing health check for universe: -Primary
2 2021-01-29 11:36:54,109 [INFO] from com.yugabyte.yw.common.DevopsBase in application-akka.actor.default-dispatcher-2343 - Command to run: [bin/py_wrapper bin/cluster_health.py --cluster_payload [{"identityFile":"/opt/yugabyte/data/keys/f3304b49-c15e- 4b2b-af1e-7535901124e8/on-premises-dc-key.pem","sshPort":22,"namespaceToConfig":{},"masterNodes":{"[IP].16.38":"yb-dev--Primary-n4","[IP].16.78":"yb-dev--Primary-n3","[IP].16.15":"yb-dev--Primary-n2"},"tserverNodes":{"[IP].16.80":"yb-dev--Primary-n10 ","[IP].16.81":"yb-dev--Primary-n12","[IP].16.38":"yb-dev--Primary-n4","[IP].16.37":"yb-dev--Primary-n1","[IP].16.77":"yb-dev--Primary-n5","[IP].16.78":"yb-dev--Primary-n3","[IP].16.79":"yb-dev--Primary-n9","[IP].16.14":"yb-dev--Primary-n6","[IP].16. 2":"yb-dev--Primary-n7","[IP].16.43":"yb-dev--Primary-n11","[IP].16.15":"yb-dev--Primary-n2","[IP].16.42":"yb-dev--Primary-n8"},"ybSoftwareVersion":"2.4.0.0-b60","enableTlsClient":true,"enableYSQL":false,"ysqlPort":5433,"ycqlPort":9042,"enableYEDIS": true,"redisPort":6379}] --universe_name -Primary --customer_tag [yw-dev][dev] --destination yugabyteoperations@ --start_time_ms 1611933640267 --send_status --report_only_errors]
3 2021-01-29 11:36:56,397 [ERROR] from com.yugabyte.yw.commissioner.HealthChecker in application-akka.actor.default-dispatcher-2343 - Health check script got error: code (1)
4 2021-01-29 11:36:54,234 INFO: Script started at 2021-01-29 08:36:54
5 2021-01-29 11:36:54,236 INFO: Checking uptime for [IP].16.42 process yb-tserver
6 2021-01-29 11:36:54,238 INFO: Checking cqlsh works for node [IP].16.42
7 2021-01-29 11:36:54,238 INFO: Checking for fatals on node [IP].16.42
8 2021-01-29 11:36:54,238 INFO: Checking redis cli works for node [IP].16.42
9 2021-01-29 11:36:54,240 INFO: Checking disk utilization on node [IP].16.42
10 2021-01-29 11:36:54,240 INFO: Checking for core files on node [IP].16.42
11 2021-01-29 11:36:54,241 INFO: Checking for open file descriptors on node [IP].16.42
12 2021-01-29 11:36:54,244 INFO: Checking uptime for [IP].16.78 process yb-master
13 2021-01-29 11:36:54,246 INFO: Checking for fatals on node [IP].16.78
14 2021-01-29 11:36:54,247 INFO: Checking uptime for [IP].16.78 process yb-tserver
15 2021-01-29 11:36:55,249 INFO: Checking for fatals on node [IP].16.78
16 2021-01-29 11:36:55,250 INFO: Checking cqlsh works for node [IP].16.78
17 2021-01-29 11:36:55,250 INFO: Checking redis cli works for node [IP].16.78
18 2021-01-29 11:36:55,251 INFO: Checking disk utilization on node [IP].16.78
19 2021-01-29 11:36:55,252 INFO: Checking for core files on node [IP].16.78
20 2021-01-29 11:36:55,253 INFO: Checking for open file descriptors on node [IP].16.78
21 2021-01-29 11:36:55,253 INFO: Checking uptime for [IP].16.79 process yb-tserver
22 2021-01-29 11:36:55,253 INFO: Checking for fatals on node [IP].16.79
23 2021-01-29 11:36:55,254 INFO: Checking cqlsh works for node [IP].16.79
24 2021-01-29 11:36:55,259 INFO: Checking redis cli works for node [IP].16.79
25 2021-01-29 11:36:56,258 INFO: Checking disk utilization on node [IP].16.79
26 2021-01-29 11:36:56,258 INFO: Checking for core files on node [IP].16.79
27 2021-01-29 11:36:56,260 INFO: Checking for open file descriptors on node [IP].16.79
28 2021-01-29 11:36:56,263 INFO: Checking uptime for [IP].16.81 process yb-tserver
29 Traceback (most recent call last):
30 File "bin/cluster_health.py", line 865, in <module>
31 main()
32 File "bin/cluster_health.py", line 833, in main
33 entries = coordinator.run()
34 File "bin/cluster_health.py", line 706, in run
35 check.entry = check.result.get() if check.result else None
36 File "/usr/lib64/python2.7/multiprocessing/pool.py", line 554, in get
37 raise self._value
38 ValueError: invalid literal for int() with base 10: 'handl'
39 2021-01-29 11:36:56,264 INFO: Checking for fatals on node [IP].16.81
40 2021-01-29 11:36:56,264 INFO: Checking cqlsh works for node [IP].16.81
41 2021-01-29 11:36:56,397 [INFO] from com.yugabyte.yw.commissioner.HealthChecker in application-akka.actor.default-dispatcher-2343 - Doing health check for universe: -Secondary
As the customer doesn't have corresponding emails (with health-check errors), check the health-checks status in YW UI. The interesting moments are: 2021-01-29 11:31:56,260 and 2021-01-29 11:36:56,263. The first goal for us is to understand which one health-check subtask gives us the error.
Closed as a duplicate of #7402.
This issue occurs on a universe with redis API is not enabled.
Platform is version 2.4.0