scylladb / scylla-jmx

Scylla JMX proxy
GNU Affero General Public License v3.0
28 stars 51 forks source link

scylla-jmx.service: is still failed during artifact tests after #206 fix - java-select fail to parse java version output #212

Closed temichus closed 1 year ago

temichus commented 1 year ago

scylla-jmx.service: is still failed during artifact tests after #206 fix, with the same error message

2023-04-25 05:36:57.472: (TestFrameworkEvent Severity.ERROR) period_type=one-time event_id=86afdf2c-607b-4467-8334-0613a0f2e28e, source=ArtifactsTest.SetUp()
exception=Encountered a bad command exit code!
Command: '/usr/bin/nodetool  status '
Exit code: 1
Stdout:
Stderr:
nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused (Connection refused)'.

from events log:

2023-04-25 05:27:21.937: (InfoEvent Severity.NORMAL) period_type=not-set event_id=bf11c34e-c95e-4964-9390-a7e667e79799: message=TEST_START test_id=dc7ec884-8eef-427e-88e0-fc5d10d5798c
2023-04-25 05:32:14.021: (ScyllaYamlUpdateEvent Severity.NORMAL) period_type=one-time event_id=d48d849b-4390-44da-8d22-cdeda3b50bc7: message=ScyllaYaml has been changed on node: artifacts-rocky8-jenkins-db-node-dc7ec884-0-1. Diff: --- 
+++ 
@@ -1,28 +1,35 @@
+alternator_enforce_authorization: false
 api_address: 127.0.0.1
 api_doc_dir: /opt/scylladb/api/api-doc/
 api_port: 10000
 api_ui_dir: /opt/scylladb/swagger-ui/dist/
+auto_bootstrap: true
 batch_size_fail_threshold_in_kb: 1024
 batch_size_warn_threshold_in_kb: 128
 cas_contention_timeout_in_ms: 1000
+cluster_name: artifacts-rocky8-jenkins-db-cluster-dc7ec884
 commitlog_segment_size_in_mb: 32
 commitlog_sync: periodic
 commitlog_sync_period_in_ms: 10000
 commitlog_total_space_in_mb: -1
 consistent_cluster_management: true
+enable_ipv6_dns_lookup: false
 endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch
+experimental: true
 force_schema_commit_log: true
-listen_address: localhost
+hinted_handoff_enabled: true
+listen_address: 10.142.0.121
 murmur3_partitioner_ignore_msb_bits: 12
 native_shard_aware_transport_port: 19042
 native_transport_port: 9042
 num_tokens: 256
 partitioner: org.apache.cassandra.dht.Murmur3Partitioner
+prometheus_address: 0.0.0.0
 read_request_timeout_in_ms: 5000
-rpc_address: localhost
+rpc_address: 10.142.0.121
 rpc_port: 9160
 seed_provider:
 - class_name: org.apache.cassandra.locator.SimpleSeedProvider
   parameters:
-  - seeds: 127.0.0.1
+  - seeds: 10.142.0.121
 write_request_timeout_in_ms: 2000
2023-04-25 05:34:51.797: (ScyllaServerStatusEvent Severity.NORMAL) period_type=begin event_id=2dd846e0-c1f9-4a4e-9076-03948c3a01cb node=artifacts-rocky8-jenkins-db-node-dc7ec884-0-1
2023-04-25 05:34:51.883 <2023-04-25 05:34:51.482>: (DatabaseLogEvent Severity.WARNING) period_type=one-time event_id=20614a9e-2c79-42ec-96d2-b1111fd00228: type=WARNING regex=(^WARNING|!\s*?WARNING).*\[shard.*\] line_number=31 node=artifacts-rocky8-jenkins-db-node-dc7ec884-0-1
2023-04-25T05:34:51.482 artifacts-rocky8-jenkins-db-node-dc7ec884-0-1 !WARNING | scylla[66824]:  [shard 0] seastar - Unable to set SCHED_FIFO scheduling policy for timer thread; latency impact possible. Try adding CAP_SYS_NICE
2023-04-25 05:34:52.447 <2023-04-25 05:34:52.099>: (DatabaseLogEvent Severity.WARNING) period_type=one-time event_id=20614a9e-2c79-42ec-96d2-b1111fd00228: type=WARNING regex=(^WARNING|!\s*?WARNING).*\[shard.*\] line_number=486 node=artifacts-rocky8-jenkins-db-node-dc7ec884-0-1
2023-04-25T05:34:52.099 artifacts-rocky8-jenkins-db-node-dc7ec884-0-1 !WARNING | scylla[66824]:  [shard 0] gossip - All nodes={} are down for get_endpoint_states verb. Skip ShadowRound.
2023-04-25 05:36:36.424: (ClusterHealthValidatorEvent Severity.WARNING) period_type=one-time event_id=c26613d5-1309-472d-bf91-1b8964822cd0: type=NodeStatus node=artifacts-rocky8-jenkins-db-node-dc7ec884-0-1 message=Unable to get nodetool status from `artifacts-rocky8-jenkins-db-node-dc7ec884-0-1': error=<UnexpectedExit: cmd='/usr/bin/nodetool  status ' exited=1>
2023-04-25 05:36:57.472: (TestFrameworkEvent Severity.ERROR) period_type=one-time event_id=86afdf2c-607b-4467-8334-0613a0f2e28e, source=ArtifactsTest.SetUp()
exception=Encountered a bad command exit code!

Command: '/usr/bin/nodetool  status '

Exit code: 1

Stdout:

Stderr:

nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused (Connection refused)'.
2023-04-25 05:36:57.524: (InfoEvent Severity.NORMAL) period_type=not-set event_id=519ea97e-5011-444d-a87a-3ef645bceb5f: message=TEST_END

job urls: https://jenkins.scylladb.com/job/scylla-master/job/artifacts-offline-install/job/artifacts-rocky8-test/198/ https://jenkins.scylladb.com/job/scylla-master/job/artifacts/job/artifacts-rocky8-test/213/ https://jenkins.scylladb.com/job/scylla-master/job/artifacts-offline-install/job/artifacts-oel81-test/187/ https://jenkins.scylladb.com/job/scylla-master/job/artifacts-offline-install/job/artifacts-oel76-test/186/

temichus commented 1 year ago

cc @fruch

mykaul commented 1 year ago

Did the JMX process crash perhaps?

temichus commented 1 year ago

Did the JMX process crash perhaps?

probably

● scylla-jmx.service - Scylla JMX
   Loaded: loaded (/usr/lib/systemd/system/scylla-jmx.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2023-04-25 05:34:52 UTC; 2min 46s ago
 Main PID: 66834 (code=exited, status=1/FAILURE)

Apr 25 05:34:52 artifacts-rocky8-jenkins-db-node-dc7ec884-0-1 systemd[1]: Started Scylla JMX.
Apr 25 05:34:52 artifacts-rocky8-jenkins-db-node-dc7ec884-0-1 systemd[1]: scylla-jmx.service: Main process exited, code=exited, status=1/FAILURE
Apr 25 05:34:52 artifacts-rocky8-jenkins-db-node-dc7ec884-0-1 systemd[1]: scylla-jmx.service: Failed with result 'exit-code'.
temichus commented 1 year ago

https://jenkins.scylladb.com/job/scylla-master/job/artifacts-offline-install/job/artifacts-rocky8-nonroot-test/182/ - has no logs, but i believe the same issue here:

RetryError[Wait for: jmx_up: timeout - 200 seconds - expired]
DoronArazii commented 1 year ago

@tchaikov can you please have a look

tchaikov commented 1 year ago

sure. Will take a look early tomorrow.

tchaikov commented 1 year ago

quote from artifacts-rocky8-test/scylla-cluster-tests/unit_tests/test_data/system.log from one of the artifact tarball collected by jenkins: , where the build id was 0a6bcf20fedb57959f501fc3caba2c4e61eacbce:

[10.0.73.70] [stdout] Apr 02 11:24:16 notice | scylla[124]: scylla-server.service: control process exited, code=exited status=1
[10.0.73.70] [stdout] Apr 02 11:24:16 err    | scylla[124]: Failed to start Scylla Server.
[10.0.73.70] [stdout] Apr 02 11:24:16 warning| scylla[124]: Dependency failed for Scylla JMX.
[10.0.73.70] [stdout] Apr 02 11:24:16 notice | scylla[124]: Job scylla-jmx.service/start failed with result 'dependency'.
[10.0.73.70] [stdout] Apr 02 11:24:16 notice | scylla[124]: Unit scylla-server.service entered failed state.

the scylla-jmx service unit failed to start because it depends on "scylla-server.service", see https://github.com/scylladb/scylla-jmx/blob/5f988945ee4747b4f9ab980f20af636b33166760/dist/common/systemd/scylla-jmx.service#L3-L4

in the very same system.log, i have the last words from scylladb:

[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: Segmentation fault on shard 11.
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: Backtrace:
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x00000000006c5af2
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x00000000005d41ac
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x00000000005d4455
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x00000000005d44a3
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: /lib64/libpthread.so.0+0x000000000000f5cf
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x0000000001c98946
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x0000000001cd0761
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x00000000013d58d2
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x00000000005d8a2b
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x00000000005b318b
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x00000000005b0314
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x000000000068795e
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x000000000068c1fa
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x000000000077535d

which is:

[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: Backtrace:[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]:
[Backtrace #0]
?? ??:0
?? ??:0
?? ??:0
?? ??:0
__pthread_cond_timedwait at :?
void seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>::for_each_fragment<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>::skip(unsigned long)::{lambda(auto:1)#1}>(unsigned long, seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>::skip(unsigned long)::{lambda(auto:1)#1}&&) at ././seastar/include/seastar/core/simple-stream.hh:390
 (inlined by) seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>::skip(unsigned long) at ././seastar/include/seastar/core/simple-stream.hh:414
 (inlined by) operator()<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ./build/release/gen/idl/mutation.dist.impl.hh:386
 (inlined by) decltype(auto) seastar::with_serialized_stream<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>, ser::serializer<ser::live_cell_view>::skip<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&)::{lambda(auto:1&)#1}, void, void>(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&, ser::serializer<ser::live_cell_view>::skip<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&)::{lambda(auto:1&)#1}&&) at ././seastar/include/seastar/core/simple-stream.hh:638
 (inlined by) void ser::serializer<ser::live_cell_view>::skip<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&) at ./build/release/gen/idl/mutation.dist.impl.hh:385
 (inlined by) operator()<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ./build/release/gen/idl/mutation.dist.impl.hh:375
 (inlined by) decltype(auto) seastar::with_serialized_stream<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>, ser::serializer<ser::live_cell_view>::read<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&)::{lambda(auto:1&)#1}, void, void>(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&, ser::serializer<ser::live_cell_view>::read<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&)::{lambda(auto:1&)#1}&&) at ././seastar/include/seastar/core/simple-stream.hh:638
 (inlined by) ser::live_cell_view ser::serializer<ser::live_cell_view>::read<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&) at ./build/release/gen/idl/mutation.dist.impl.hh:372
 (inlined by) auto ser::deserialize<ser::live_cell_view, seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&, boost::type<ser::live_cell_view>) at ././serializer.hh:261
 (inlined by) operator()<const seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ./build/release/gen/idl/mutation.dist.impl.hh:702
 (inlined by) decltype(auto) seastar::memory_input_stream<bytes_ostream::fragment_iterator>::with_stream<ser::expiring_cell_view::c() const::{lambda(auto:1&)#1}>(ser::expiring_cell_view::c() const::{lambda(auto:1&)#1}&&) const at ././seastar/include/seastar/core/simple-stream.hh:486
 (inlined by) decltype(auto) seastar::with_serialized_stream<seastar::memory_input_stream<bytes_ostream::fragment_iterator> const, ser::expiring_cell_view::c() const::{lambda(auto:1&)#1}, void>(seastar::memory_input_stream<bytes_ostream::fragment_iterator> const&, ser::expiring_cell_view::c() const::{lambda(auto:1&)#1}&&) at ././seastar/include/seastar/core/simple-stream.hh:631
 (inlined by) ser::expiring_cell_view::c() const at ./build/release/gen/idl/mutation.dist.impl.hh:696
 (inlined by) operator() at ./mutation/mutation_partition_view.cc:52
 (inlined by) _ZN5boost6detail7variant14invoke_visitorIKZN12_GLOBAL__N_116read_atomic_cellERK13abstract_typeNS_7variantIN3ser14live_cell_viewEJNS8_18expiring_cell_viewENS8_14dead_cell_viewENS8_17counter_cell_viewENS8_20unknown_variant_typeEEEEN7seastar10bool_classIN11atomic_cell21collection_member_tagEEEE19atomic_cell_visitorLb0EE14internal_visitIRSA_EENS_12disable_if_cIXaaLb0Esr7is_sameIT_SQ_EE5valueESH_E4typeEOSQ_i at /usr/include/boost/variant/variant.hpp:1028
 (inlined by) boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>::result_type boost::detail::variant::visitation_impl_invoke_impl<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>, void*, ser::expiring_cell_view>(int, boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>&, void*, ser::expiring_cell_view*, mpl_::bool_<true>) at /usr/include/boost/variant/detail/visitation_impl.hpp:117
 (inlined by) boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>::result_type boost::detail::variant::visitation_impl_invoke<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>, void*, ser::expiring_cell_view, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>::has_fallback_type_>(int, boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>&, void*, ser::expiring_cell_view*, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>::has_fallback_type_, int) at /usr/include/boost/variant/detail/visitation_impl.hpp:157
 (inlined by) boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>::result_type boost::detail::variant::visitation_impl<mpl_::int_<0>, boost::detail::variant::visitation_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<5l>, ser::live_cell_view, boost::mpl::l_item<mpl_::long_<4l>, ser::expiring_cell_view, boost::mpl::l_item<mpl_::long_<3l>, ser::dead_cell_view, boost::mpl::l_item<mpl_::long_<2l>, ser::counter_cell_view, boost::mpl::l_item<mpl_::long_<1l>, ser::unknown_variant_type, boost::mpl::l_end> > > > > >, boost::mpl::l_iter<boost::mpl::l_end> >, boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>, void*, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>::has_fallback_type_>(int, int, boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>&, void*, mpl_::bool_<false>, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>::has_fallback_type_, mpl_::int_<0>*, boost::detail::variant::visitation_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<5l>, ser::live_cell_view, boost::mpl::l_item<mpl_::long_<4l>, ser::expiring_cell_view, boost::mpl::l_item<mpl_::long_<3l>, ser::dead_cell_view, boost::mpl::l_item<mpl_::long_<2l>, ser::counter_cell_view, boost::mpl::l_item<mpl_::long_<1l>, ser::unknown_variant_type, boost::mpl::l_end> > > > > >, boost::mpl::l_iter<boost::mpl::l_end> >*) at /usr/include/boost/variant/detail/visitation_impl.hpp:238
 (inlined by) boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>::result_type boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>::internal_apply_visitor_impl<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>, void*>(int, int, boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>&, void*) at /usr/include/boost/variant/variant.hpp:2337
 (inlined by) boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>::result_type boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>::internal_apply_visitor<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false> >(boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>&) at /usr/include/boost/variant/variant.hpp:2349
 (inlined by) (anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const::result_type boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>::apply_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const>((anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const&) & at /usr/include/boost/variant/variant.hpp:2393
void allocation_strategy::destroy<partition_version>(partition_version*) at ././utils/allocation_strategy.hh:168
 (inlined by) remove_or_mark_as_unique_owner(partition_version*, mutation_cleaner*) at ./mutation/partition_version.cc:25
 (inlined by) operator() at ./mutation/partition_version.cc:161
 (inlined by) decltype(auto) with_allocator<partition_snapshot::~partition_snapshot()::$_10>(allocation_strategy&, partition_snapshot::~partition_snapshot()::$_10&&) at ././utils/allocation_strategy.hh:313
 (inlined by) ~partition_snapshot at ./mutation/partition_version.cc:154
 (inlined by) seastar::internal::lw_shared_ptr_accessors_esft<partition_snapshot>::dispose(partition_snapshot*) at ././seastar/include/seastar/core/shared_ptr.hh:205
 (inlined by) seastar::internal::lw_shared_ptr_accessors_esft<partition_snapshot>::dispose(seastar::lw_shared_ptr_counter_base*) at ././seastar/include/seastar/core/shared_ptr.hh:202
 (inlined by) ~lw_shared_ptr at ././seastar/include/seastar/core/shared_ptr.hh:317
 (inlined by) ~partition_snapshot_ptr at ./mutation/partition_version.cc:675
seastar::internal::future_base::move_it(seastar::internal::future_base&&, seastar::future_state_base*) at ././seastar/include/seastar/core/future.hh:1090
 (inlined by) future_base at ././seastar/include/seastar/core/future.hh:1099
 (inlined by) future at ././seastar/include/seastar/core/future.hh:1305
 (inlined by) _Head_base at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/tuple:196
 (inlined by) _Tuple_impl at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/tuple:456
 (inlined by) _Tuple_impl at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/tuple:301
 (inlined by) tuple at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/tuple:1090
 (inlined by) ~when_all_state at ././seastar/include/seastar/core/when_all.hh:153
 (inlined by) ~when_all_state at ././seastar/include/seastar/core/when_all.hh:152
?? ??:0
tchaikov commented 1 year ago

filed https://github.com/scylladb/scylladb/issues/13700

mykaul commented 1 year ago

@tchaikov - are these issues fallout from the upgrade of the Java version? We'll need to revert or fix the Java code, I'm afraid.

tchaikov commented 1 year ago

@tchaikov - are these issues fallout from the upgrade of the Java version? We'll need to revert or fix the Java code, I'm afraid.

hi @mykaul i don't know. as i don't have any proof that they are. please see the analysis at https://github.com/scylladb/scylla-jmx/issues/212#issuecomment-1526979460 . scylladb crashed before scylla-jmx exporter tries to start. so i think these two things are correlated, but i am afraid this does not imply causation.

mykaul commented 1 year ago

@avikivity - who should look at the above crash?

fruch commented 1 year ago

@tchaikov

I don't think this issue is related to any scylla crash, i'm not sure where you are getting this crash information from

seems like the code in /opt/scylladb/jmx/select-java isn't picking correctly the java on those setups

since the output of /usr/bin/java -version 2>&1 is:

Picked up JAVA_TOOL_OPTIONS: openjdk version "11.0.19" 2023-04-18 LTS OpenJDK Runtime Environment (Red_Hat-11.0.19.0.7-1.el9_1) (build 11.0.19+7-LTS) OpenJDK 64-Bit Server VM (Red_Hat-11.0.19.0.7-1.el9_1) (build 11.0.19+7-LTS, mixed mode, sharing)

and the logic in:

function select_java_others() {
    local javaver
    javaver=$(/usr/bin/java -version 2>&1|head -n1|cut -f 3 -d " ")

    if [[ "$javaver" =~ "^\"1.8.0" ]] || [[ "$javaver" =~ "^\"11.0." ]]; then
        echo /usr/bin/java
    fi
}

kind of kind of breaks, and no java is being selected:

May 14 08:07:44 artifacts-rocky9-jenkins-db-node-81ce9ada-0-1 scylla-jmx[69027]: +++ head -n1
May 14 08:07:44 artifacts-rocky9-jenkins-db-node-81ce9ada-0-1 scylla-jmx[69028]: +++ cut -f 3 -d ' '
May 14 08:07:44 artifacts-rocky9-jenkins-db-node-81ce9ada-0-1 scylla-jmx[69024]: ++ javaver=JAVA_TOOL_OPTIONS:
May 14 08:07:44 artifacts-rocky9-jenkins-db-node-81ce9ada-0-1 scylla-jmx[69024]: ++ [[ JAVA_TOOL_OPTIONS: =~ \^"1\.8\.0 ]]
May 14 08:07:44 artifacts-rocky9-jenkins-db-node-81ce9ada-0-1 scylla-jmx[69024]: ++ [[ JAVA_TOOL_OPTIONS: =~ \^"11\.0\. ]]
May 14 08:07:44 artifacts-rocky9-jenkins-db-node-81ce9ada-0-1 scylla-jmx[69001]: + java=
May 14 08:07:44 artifacts-rocky9-jenkins-db-node-81ce9ada-0-1 scylla-jmx[69001]: + '[' -z '' ']'
May 14 08:07:44 artifacts-rocky9-jenkins-db-node-81ce9ada-0-1 scylla-jmx[69001]: + exit 1
May 14 08:07:44 artifacts-rocky9-jenkins-db-node-81ce9ada-0-1 systemd[1]: scylla-jmx.service: Main process exited, code=exited, status=1/FAILURE
May 14 08:07:44 artifacts-rocky9-jenkins-db-node-81ce9ada-0-1 systemd[1]: scylla-jmx.service: Failed with result 'exit-code'.

and java select silently fail, i.e. with no description visible in the log, why it's failing.

fruch commented 1 year ago

@temichus @mykaul, can one of you change the title to "java-select fail to parse java version output"

fruch commented 1 year ago

@tchaikov also we should consider checking ID_LIKE, cause I can confirm when adding rocky to java-select it worked as expected

see /etc/os-release from one only the failing jobs:

NAME="Rocky Linux"
VERSION="9.1 (Blue Onyx)"
ID="rocky"
ID_LIKE="rhel centos fedora"

you can see examples from all the distos we do support here: https://github.com/scylladb/scylla-cluster-tests/blob/08f927dd885ce1fc5ad7b712138146d4434d1451/unit_tests/test_utils_distro.py#L21

tchaikov commented 1 year ago

@fruch hi Israel, thank you very much for pointing out the issue of select-java and for the suggestion. i am creating a pull request to address all of the issues noted here.

but the segfault in scylla is still a mystery. i captured a snapshot of the test_data.zip, and noted down the steps to get it in https://github.com/scylladb/scylladb/issues/13700 .

avikivity commented 1 year ago

@avikivity - who should look at the above crash?

The decode is bogus, so can't triage.