segfault when degraded cluster restores

cc32d9 commented 1 year ago

I will collect more data when my current test finishes. But this is what I observed with 2.16.2b:

a cluster of 4 machines, running latest 5.2 release candidate. The keyspace has replication factor 3. The writer is pushing about 30k inserts per second, with consistency level set to QUORUM.

While the client is stopped, I stopped one of the servers. Then I started the client (all 4 are configured in cluster contact points). The client complained a bit about failed connections, but went chugging along, as we have enough replicas for the quorum.

Then I started the server that was stopped, and the client segfaulted immediately as the server started accepting connections.

cc32d9 commented 1 year ago

the writer in question: https://github.com/EOSChronicleProject/chronos/blob/main/writer/exp_chronos_plugin.cpp

    cass_cluster_set_local_port_range(cluster, 49152, 65535);
    cass_cluster_set_core_connections_per_host(cluster, scylla_conn_per_host);
    cass_cluster_set_request_timeout(cluster, 100000);
    cass_cluster_set_num_threads_io(cluster, scylla_io_threads);
    cass_cluster_set_queue_size_io(cluster, 1048576);

scylla_conn_per_host is set to 1, and scylla_io_threads=4.

cc32d9 commented 1 year ago

[48556796.397656] chronos-writer[1981192]: segfault at 8 ip 00007f60800c5af8 sp 00007f5c777fa9d0 error 4 in libscylla-cpp-driver.so.2.16.2-b[7f607ffd2000+293000]
[48556796.397677] Code: 01 00 00 4c 8d 2c d0 0f 85 4e 02 00 00 49 8b 6d 08 4d 8b 75 00 49 39 ee 0f 84 ac 00 00 00 4d 89 f4 49 83 c6 08 4c 39 f5 74 48 <49> 8b 3e e8 00 54 08 00 84 c0 75 eb 49 8b 3c 24 e8 f3 53 08 00 84

cc32d9 commented 1 year ago

@jul-stas ^^

scylladb / cpp-driver

segfault when degraded cluster restores #78