scylladb / scylla-jmx

Scylla JMX proxy
GNU Affero General Public License v3.0
29 stars 54 forks source link

dtests sometime fail with unable to connect to scylla-jmx #98

Open bhalevy opened 4 years ago

bhalevy commented 4 years ago

See https://github.com/scylladb/scylla-ccm/issues/223#issuecomment-595040717

Still seeing this, e.g. https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/1678/testReport/junit/bootstrap_test/TestBootstrap/start_stop_test/ Scylla version 359b32fb63e2c5f88ff855e535b647984e2fe623

Traceback (most recent call last):
  File "/usr/lib64/python3.7/unittest/case.py", line 60, in testPartExecutor
    yield
  File "/usr/lib64/python3.7/unittest/case.py", line 645, in run
    testMethod()
  File "/jenkins/workspace/scylla-master/next/scylla-dtest/bootstrap_test.py", line 53, in start_stop_test
    cluster.start(wait_for_binary_proto=True, wait_other_notice=True)
  File "/jenkins/workspace/scylla-master/next/scylla-ccm/ccmlib/scylla_cluster.py", line 137, in start
    started = self.start_nodes(**args)
  File "/jenkins/workspace/scylla-master/next/scylla-ccm/ccmlib/scylla_cluster.py", line 109, in start_nodes
    profile_options=profile_options, no_wait=no_wait)
  File "/jenkins/workspace/scylla-master/next/scylla-ccm/ccmlib/scylla_node.py", line 516, in start
    raise NodeError(e_msg, scylla_process)
ccmlib.node.NodeError: Error starting node node1: unable to connect to scylla-jmx port 127.0.89.1:7189

https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/1678/artifact/logs-release.2/dtest.log indicates that 2 processes were killed. Since the test starts only 1 node these should be scylla and scylla-jmx

2020-03-03 15:44:01,849 169     dtest                          DEBUG    | bootstrap_test.py:TestBootstrap.start_stop_test - cluster ccm directory: /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-ny5nvwt0
2020-03-03 15:44:01,850 169     dtest                          DEBUG    | bootstrap_test.py:TestBootstrap.start_stop_test - Starting Scylla cluster from directory /jenkins/workspace/scylla-master/next/scylla-dtest/../scylla/build/release/
2020-03-03 15:44:01,853 169     dtest                          DEBUG    | bootstrap_test.py:TestBootstrap.start_stop_test - Allocated cluster ID 89: /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-ny5nvwt0
2020-03-03 15:44:01,860 169     dtest                          DEBUG    | bootstrap_test.py:TestBootstrap.start_stop_test - configuring skip_wait_for_gossip_to_settle=0 for single_node test
2020-03-03 15:44:01,861 169     dtest                          DEBUG    | bootstrap_test.py:TestBootstrap.start_stop_test - populating cluster with one node
2020-03-03 15:44:15,809 169     dtest                          DEBUG    | bootstrap_test.py:TestBootstrap.start_stop_test - starting cluster
2020-03-03 15:44:45,900 169     dtest                          DEBUG    | bootstrap_test.py:TestBootstrap.start_stop_test - Test failed with errors: [(<bootstrap_test.TestBootstrap testMethod=start_stop_test>, (<class 'ccmlib.node.NodeError'>, NodeError('Error starting node node1: unable to connect to scylla-jmx port 127.0.89.1:7189'), <traceback object at 0x7f208c536690>))]
2020-03-03 15:44:45,905 169     dtest                          DEBUG    | bootstrap_test.py:TestBootstrap.start_stop_test - removing ccm cluster test at: /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-ny5nvwt0
2020-03-03 15:44:46,981 169     dtest                          DEBUG    | bootstrap_test.py:TestBootstrap.start_stop_test - proc 182 killed - cluster 127.0.89.
2020-03-03 15:44:46,982 169     dtest                          DEBUG    | bootstrap_test.py:TestBootstrap.start_stop_test - proc 184 killed - cluster 127.0.89.
2020-03-03 15:44:46,982 169     dtest                          DEBUG    | bootstrap_test.py:TestBootstrap.start_stop_test - Freeing cluster ID 89: link /jenkins/workspace/scylla-master/next/scylla/.dtest/89

So it seems like the scylla-jmx process is up but unresponsive.

bhalevy commented 4 years ago

As I wrote on https://github.com/scylladb/scylla-ccm/issues/223#issuecomment-624079777 I saw this today:

https://jenkins.scylladb.com/view/master/job/scylla-master/job/byo/job/dtest-byo/144/artifact/logs-release.2/1588687609026_materialized_views_test.TestMaterializedViews.add_dc_during_mv_insert_test/node1_jmx.log

Using config file: /jenkins/workspace/scylla-master/byo/dtest-byo/scylla/.dtest/dtest-3ngmni08/test/node1/conf/scylla.yaml
library initialization failed - unable to allocate file descriptor table - out of memory
penberg commented 4 years ago

@bhalevy The "unable to allocate file descriptor table" is an artifact of the node running out of memory. You ran the test on thor so it's unfortunately pretty common scenario...