uber / cadence

Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way.
https://cadenceworkflow.io
MIT License
8.29k stars 799 forks source link

protocol error when connecting to cassandra equivalent (scylladb) #3314

Open listaction opened 4 years ago

listaction commented 4 years ago

Describe the bug I get the following error when I run docker-compose up from the docker directory with one modification. Connection error: ('Unable to connect to any servers', {'172.18.0.2:9042': ProtocolError("cql_version '3.4.4' is not supported by remote (w/ native protocol). Supported versions: [u'3.3.1']",)})

To Reproduce cd docker

- image: cassandra:3.11 + image: scylladb/scylla

Is the issue reproducible? yes

This also happens with : scylladb/scylla:4.0.1-0.20200603.4f4845c94c9

rmenn commented 4 years ago

I tried this out with scylla-4.1.0 i commented out the --cqlversion=3.4.4 in the start script then build a custom container but the error does not stop there, right now it stops at

{  
   "level":"fatal",
   "ts":"2020-07-06T11:57:55.359Z",
   "msg":"Fail to start history service ",
   "error":"failed to check and create queue metadata entry: failed to insert initial queue metadata record: gocql: not enough columns to scan into: have 1 want 4, Type: -1",
   "logging-call-at":"server.go:229",
   "stacktrace":"github.com/uber/cadence/common/log/loggerimpl.(*loggerImpl).Fatal\n\t/cadence/common/log/loggerimpl/logger.go:140\ngithub.com/uber/cadence/cmd/server/cadence.(*server).startService\n\t/cadence/cmd/server/cadence/server.go:229\ngithub.com/uber/cadence/cmd/server/cadence.(*server).Start\n\t/cadence/cmd/server/cadence/server.go:82\ngithub.com/uber/cadence/cmd/server/cadence.startHandler\n\t/cadence/cmd/server/cadence/cadence.go:80\ngithub.com/uber/cadence/cmd/server/cadence.BuildCLI.func1\n\t/cadence/cmd/server/cadence/cadence.go:200\ngithub.com/urfave/cli.HandleAction\n\t/go/pkg/mod/github.com/urfave/cli@v1.20.0/app.go:492\ngithub.com/urfave/cli.Command.Run\n\t/go/pkg/mod/github.com/urfave/cli@v1.20.0/command.go:210\ngithub.com/urfave/cli.(*App).Run\n\t/go/pkg/mod/github.com/urfave/cli@v1.20.0/app.go:255\nmain.main\n\t/cadence/cmd/server/main.go:34\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"
}
meiliang86 commented 4 years ago

Schema setup is failing. You need to debug further to check which statement fails. You can try to manually execute the CQL statements we use for schema setup.

listaction commented 4 years ago

@meiliang86 : i would opt for scyalla to get to parity with cassandra unless @mfateev has suggestions on lower versions of cql and support.

rmenn commented 4 years ago

I was able to setup the schema using the cadence-cassandra-tool as per the start.sh. Looks like the problem isnt with the schema setup ?

If i am right, the error is from here : https://github.com/uber/cadence/blob/master/common/persistence/cassandra/cassandraQueue.go#L98

bash-5.0# SCHEMA_DIR=$CADENCE_HOME/schema/cassandra/cadence/versioned
bash-5.0# cadence-cassandra-tool --ep $CASSANDRA_SEEDS create -k $KEYSPACE --rf 1
bash-5.0# cadence-cassandra-tool --ep $CASSANDRA_SEEDS -k $KEYSPACE setup-schema -v 0.0
2020/08/19 18:36:10 Starting schema setup, config=&{SchemaFilePath: InitialVersion:0.0 Overwrite:false DisableVersioning:false}
2020/08/19 18:36:10 Setting up version tables
2020/08/19 18:36:10 Setting initial schema version to 0.0
2020/08/19 18:36:10 Updating schema update log
2020/08/19 18:36:10 Schema setup complete
bash-5.0# cadence-cassandra-tool --ep $CASSANDRA_SEEDS -k $KEYSPACE update-schema -d $SCHEMA_DIR
2020/08/19 18:36:22 UpdateSchemeTask started, config=&{DBName: TargetVersion: SchemaDir:/etc/cadence/schema/cassandra/cadence/versioned IsDryRun:false}
2020/08/19 18:36:22 ---- Executing updates for version 0.23 ----
2020/08/19 18:36:22 CREATE TYPE shard (shard_id int,owner text, range_id bigint,stolen_since_renew int,updated_at timestamp,transfer_ack_level bigint, timer_ack_level timestamp, replication_ack_level bigint,cluster_transfer_ack_level map<text, bigint>,cluster_timer_ack_level map<text, timestamp>,domain_notification_version bigint, cluster_replication_level map<text, bigint>,);
2020/08/19 18:36:22 CREATE TYPE workflow_execution (domain_id uuid,workflow_id text,run_id uuid,parent_domain_id uuid, parent_workflow_id text, parent_run_id uuid, initiated_id bigint, completion_event blob, task_list text,workflow_type_name text,decision_task_timeout int, execution_context blob,state int, close_status int, next_event_id bigint,last_processed_event bigint,start_time timestamp,last_updated_time timestamp,create_request_id uuid,decision_schedule_id bigint,decision_started_id bigint,decision_request_id text, decision_timeout int,cancel_requested boolean,cancel_request_id text,workflow_timeout int, sticky_task_list text, sticky_schedule_to_start_timeout int,decision_attempt bigint,decision_timestamp bigint, client_library_version text,client_feature_version text,client_impl text,last_first_event_id bigint,decision_version bigint,attempt int, has_retry_policy boolean,init_interval int, backoff_coefficient double,max_interval int, expiration_time timestamp, max_attempts int, non_retriable_errors list<text>,history_size bigint,completion_event_data_encoding text, event_store_version int, branch_token blob,signal_count int,cron_schedule text,expiration_seconds int, completion_event_batch_id bigint,last_event_task_id bigint,auto_reset_points blob, auto_reset_points_encoding text, decision_scheduled_timestamp bigint, search_attributes map<text, blob>,memo map<text, blob>,decision_original_scheduled_timestamp bigint, );
2020/08/19 18:36:22 CREATE TYPE replication_info (version bigint,last_event_id bigint,);
2020/08/19 18:36:22 CREATE TYPE replication_state (current_version bigint, start_version bigint, last_write_version bigint, last_write_event_id bigint, last_replication_info map<text, frozen<replication_info>>, );
2020/08/19 18:36:22 CREATE TYPE transfer_task (domain_id uuid, workflow_id text, run_id uuid, task_id bigint,target_domain_id uuid, target_workflow_id text, target_run_id uuid, task_list text,type int, schedule_id bigint,target_child_workflow_only boolean, version bigint, visibility_ts timestamp, record_visibility boolean, );
2020/08/19 18:36:22 CREATE TYPE replication_task (domain_id uuid, workflow_id text, run_id uuid, task_id bigint,type int, first_event_id bigint, next_event_id bigint, version bigint, last_replication_info map<text, frozen<replication_info>>, scheduled_id bigint, event_store_version int, branch_token blob, new_run_event_store_version int, new_run_branch_token blob, reset_workflow boolean, );
2020/08/19 18:36:22 CREATE TYPE timer_task (domain_id uuid,workflow_id text,run_id uuid,visibility_ts timestamp,task_id bigint,type int, timeout_type int, event_id bigint, schedule_attempt bigint, version bigint, );
2020/08/19 18:36:23 CREATE TYPE activity_info (schedule_id bigint,scheduled_event blob, scheduled_time timestamp,started_id bigint,started_event blob,started_time timestamp,activity_id text, request_id text, details blob,schedule_to_start_timeout int,schedule_to_close_timeout int,start_to_close_timeout int,heart_beat_timeout int,cancel_requested boolean, cancel_request_id bigint, last_hb_updated_time timestamp, timer_task_status int, version bigint,attempt int, task_list text,started_identity text, has_retry_policy boolean,init_interval int, backoff_coefficient double,max_interval int, expiration_time timestamp, max_attempts int, non_retriable_errors list<text>,event_data_encoding text, scheduled_event_batch_id bigint,last_failure_reason text,last_worker_identity text, last_failure_details blob);
2020/08/19 18:36:23 CREATE TYPE timer_info (timer_id text, started_id bigint, expiry_time timestamp, task_id bigint,version bigint,);
2020/08/19 18:36:23 CREATE TYPE child_execution_info (initiated_id bigint,initiated_event blob,started_id bigint,started_event blob, create_request_id uuid,version bigint,event_data_encoding text, initiated_event_batch_id bigint,started_workflow_id text,started_run_id uuid,domain_name text,workflow_type_name text,parent_close_policy int,);
2020/08/19 18:36:23 CREATE TYPE request_cancel_info (initiated_id bigint,cancel_request_id text,version bigint,initiated_event_batch_id bigint,);
2020/08/19 18:36:23 CREATE TYPE signal_info (initiated_id bigint,signal_request_id uuid,signal_name text,input blob,control blob,version bigint,initiated_event_batch_id bigint,);
2020/08/19 18:36:23 CREATE TYPE task (domain_id uuid,workflow_id text,run_id uuid,schedule_id bigint,created_time timestamp);
2020/08/19 18:36:23 CREATE TYPE task_list (domain_id uuid,name text,type int, ack_level bigint, kind int, last_updated timestamp);
2020/08/19 18:36:23 CREATE TYPE domain (id uuid,name text,status int, description text,owner_email text,data map<text,text>, );
2020/08/19 18:36:23 CREATE TYPE domain_config (retention int,emit_metric boolean,archival_bucket text, archival_status int, bad_binaries blob,bad_binaries_encoding text,history_archival_status int,history_archival_uri text,visibility_archival_status int,visibility_archival_uri text);
2020/08/19 18:36:23 CREATE TYPE cluster_replication_config (cluster_name text,);
2020/08/19 18:36:24 CREATE TYPE domain_replication_config (active_cluster_name text,clusters list<frozen<cluster_replication_config>>);
2020/08/19 18:36:24 CREATE TYPE serialized_event_batch (encoding_type text,version int,data blob,);
2020/08/19 18:36:24 CREATE TYPE buffered_replication_task_info (first_event_id bigint,next_event_id bigint,version bigint,history frozen<serialized_event_batch>,new_run_history frozen<serialized_event_batch>,event_store_version int, new_run_event_store_version int, );
2020/08/19 18:36:24 CREATE TYPE branch_range (branch_id uuid,end_node_id bigint, );
2020/08/19 18:36:24 CREATE TABLE executions (shard_id int,type int, domain_id uuid,workflow_id text,run_id uuid,current_run_id uuid,visibility_ts timestamp, task_id bigint, shard frozen<shard>,execution frozen<workflow_execution>,transfer frozen<transfer_task>,replication frozen<replication_task>,timer frozen<timer_task>,next_event_id bigint, range_id bigint, activity_map map<bigint, frozen<activity_info>>,timer_map map<text, frozen<timer_info>>,child_executions_map map<bigint, frozen<child_execution_info>>,request_cancel_map map<bigint, frozen<request_cancel_info>>,signal_map map<bigint, frozen<signal_info>>,signal_requested set<uuid>,buffered_events_list list<frozen<serialized_event_batch>>,replication_state frozen<replication_state>, buffered_replication_tasks_map map<bigint, frozen<buffered_replication_task_info>>,workflow_last_write_version bigint,workflow_state int,version_histories blob, version_histories_encoding text,PRIMARY KEY (shard_id, type, domain_id, workflow_id, run_id, visibility_ts, task_id)) WITH COMPACTION = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'};
2020/08/19 18:36:24 CREATE TABLE history_node (tree_id uuid,branch_id uuid,node_id bigint, txn_id bigint, data blob, data_encoding text, PRIMARY KEY ((tree_id), branch_id, node_id, txn_id )) WITH CLUSTERING ORDER BY (branch_id ASC, node_id ASC, txn_id DESC)AND COMPACTION = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'};
2020/08/19 18:36:24 CREATE TABLE history_tree (tree_id uuid,branch_id uuid,ancestors list<frozen<branch_range>>,fork_time timestamp, info text, PRIMARY KEY ((tree_id), branch_id )) WITH COMPACTION = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'};
2020/08/19 18:36:25 CREATE TABLE tasks (domain_id uuid,task_list_name text,task_list_type int, type int, task_id bigint, range_id bigint, task frozen<task>,task_list frozen<task_list>,PRIMARY KEY ((domain_id, task_list_name, task_list_type), type, task_id)) WITH COMPACTION = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'};
2020/08/19 18:36:25 CREATE TABLE domains (id uuid,domain frozen<domain>,config frozen<domain_config>,PRIMARY KEY (id)) WITH COMPACTION = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'};
2020/08/19 18:36:25 CREATE TABLE domains_by_name_v2 (domains_partition int,name text,domain frozen<domain>,config frozen<domain_config>,replication_config frozen<domain_replication_config>, is_global_domain boolean, config_version bigint, failover_version bigint, failover_notification_version bigint, notification_version bigint,PRIMARY KEY (domains_partition, name)) WITH COMPACTION = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'};
2020/08/19 18:36:25 CREATE TABLE queue (queue_type int,message_id int,message_payload blob,PRIMARY KEY (queue_type, message_id)) WITH COMPACTION = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'};
2020/08/19 18:36:25 CREATE TABLE queue_metadata (queue_type int,cluster_ack_level map<text, bigint>,version bigint,PRIMARY KEY (queue_type)) WITH COMPACTION = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'};
2020/08/19 18:36:26 CREATE TABLE events (domain_id uuid,workflow_id text,run_id uuid,first_event_id bigint,range_id bigint,tx_id bigint,data blob, data_encoding text, data_version int, event_batch_version bigint,PRIMARY KEY ((domain_id, workflow_id, run_id), first_event_id)) WITH COMPACTION = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'};
2020/08/19 18:36:26 INSERT INTO domains_by_name_v2 (domains_partition,name,domain,config,is_global_domain,config_version,failover_version,failover_notification_version,notification_version) VALUES (0,'cadence-system',{id: 32049b68-7872-4094-8e63-d0dd59896a83,name: 'cadence-system',description: 'cadence system workflow domain',owner_email: 'cadence-dev-group@uber.com'},{retention:3,emit_metric:False},False,0,-24,-1,-1) IF NOT EXISTS;
2020/08/19 18:36:26 INSERT INTO domains (id,domain) VALUES (32049b68-7872-4094-8e63-d0dd59896a83,{name: 'cadence-system'}) IF NOT EXISTS;
2020/08/19 18:36:26 ---- Done ----
2020/08/19 18:36:26 Schema updated from 0.0 to 0.23, elapsed 4.0578129s
2020/08/19 18:36:26 ---- Executing updates for version 0.24 ----
2020/08/19 18:36:26 CREATE TYPE checksum (version int, flavor int, value blob);
2020/08/19 18:36:26 ALTER TABLE executions ADD checksum frozen<checksum>;
2020/08/19 18:36:26 ---- Done ----
2020/08/19 18:36:26 Schema updated from 0.23 to 0.24, elapsed 346.1999ms
2020/08/19 18:36:26 ---- Executing updates for version 0.25 ----
2020/08/19 18:36:26 ALTER TYPE shard ADD replication_dlq_ack_level map<text, bigint>;
2020/08/19 18:36:27 ---- Done ----
2020/08/19 18:36:27 Schema updated from 0.24 to 0.25, elapsed 320.9426ms
2020/08/19 18:36:27 ---- Executing updates for version 0.26 ----
2020/08/19 18:36:27 DROP TABLE queue;
2020/08/19 18:36:27 CREATE TABLE queue (queue_type int,message_id bigint,message_payload blob,PRIMARY KEY (queue_type, message_id)) WITH COMPACTION = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'};
2020/08/19 18:36:27 ---- Done ----
2020/08/19 18:36:27 Schema updated from 0.25 to 0.26, elapsed 490.6209ms
2020/08/19 18:36:27 ---- Executing updates for version 0.27 ----
2020/08/19 18:36:27 ALTER TABLE domains_by_name_v2 ADD failover_end_time bigint;
2020/08/19 18:36:27 ---- Done ----
2020/08/19 18:36:27 Schema updated from 0.26 to 0.27, elapsed 211.2085ms
2020/08/19 18:36:27 ---- Executing updates for version 0.28 ----
2020/08/19 18:36:27 ALTER TYPE replication_task ADD created_time bigint;
2020/08/19 18:36:28 ALTER TABLE domains_by_name_v2 ADD previous_failover_version bigint;
2020/08/19 18:36:28 ALTER TYPE shard ADD pending_failover_markers blob;
2020/08/19 18:36:28 ALTER TYPE shard ADD pending_failover_markers_encoding text;
2020/08/19 18:36:29 ---- Done ----
2020/08/19 18:36:29 Schema updated from 0.27 to 0.28, elapsed 1.2409845s
2020/08/19 18:36:29 All schema changes completed in 6.6697263s
2020/08/19 18:36:29 UpdateSchemeTask done
bash-5.0# VISIBILITY_SCHEMA_DIR=$CADENCE_HOME/schema/cassandra/visibility/versioned
bash-5.0# cadence-cassandra-tool --ep $CASSANDRA_SEEDS create -k $VISIBILITY_KEYSPACE --rf 1
bash-5.0# cadence-cassandra-tool --ep $CASSANDRA_SEEDS -k $VISIBILITY_KEYSPACE setup-schema -v 0.0
2020/08/19 18:37:00 Starting schema setup, config=&{SchemaFilePath: InitialVersion:0.0 Overwrite:false DisableVersioning:false}
2020/08/19 18:37:00 Setting up version tables
2020/08/19 18:37:01 Setting initial schema version to 0.0
2020/08/19 18:37:01 Updating schema update log
2020/08/19 18:37:01 Schema setup complete
bash-5.0#  cadence-cassandra-tool --ep $CASSANDRA_SEEDS -k $VISIBILITY_KEYSPACE update-schema -d $VISIBILITY_SCHEMA_DIR
2020/08/19 18:37:12 UpdateSchemeTask started, config=&{DBName: TargetVersion: SchemaDir:/etc/cadence/schema/cassandra/visibility/versioned IsDryRun:false}
2020/08/19 18:37:12 ---- Executing updates for version 0.1 ----
2020/08/19 18:37:12 CREATE TABLE open_executions (domain_id uuid,domain_partition int,workflow_id text,run_id uuid,start_time timestamp,workflow_type_name text,PRIMARY KEY ((domain_id, domain_partition), start_time, run_id)) WITH CLUSTERING ORDER BY (start_time DESC)AND COMPACTION = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}AND GC_GRACE_SECONDS = 172800;
2020/08/19 18:37:12 CREATE INDEX open_by_workflow_id ON open_executions (workflow_id);
2020/08/19 18:37:12 CREATE INDEX open_by_type ON open_executions (workflow_type_name);
2020/08/19 18:37:13 CREATE TABLE closed_executions (domain_id uuid,domain_partition int,workflow_id text,run_id uuid,start_time timestamp,close_time timestamp,status int, workflow_type_name text,history_length bigint,PRIMARY KEY ((domain_id, domain_partition), start_time, run_id)) WITH CLUSTERING ORDER BY (start_time DESC)AND COMPACTION = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}AND GC_GRACE_SECONDS = 172800;
2020/08/19 18:37:13 CREATE INDEX closed_by_workflow_id ON closed_executions (workflow_id);
2020/08/19 18:37:13 CREATE INDEX closed_by_close_time ON closed_executions (close_time);
2020/08/19 18:37:14 CREATE INDEX closed_by_type ON closed_executions (workflow_type_name);
2020/08/19 18:37:14 CREATE INDEX closed_by_status ON closed_executions (status);
2020/08/19 18:37:14 ---- Done ----
2020/08/19 18:37:14 Schema updated from 0.0 to 0.1, elapsed 2.3642127s
2020/08/19 18:37:14 ---- Executing updates for version 0.2 ----
2020/08/19 18:37:14 ALTER TABLE open_executions WITH gc_grace_seconds=60 AND compaction = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy','tombstone_threshold': 0.4};
2020/08/19 18:37:14 ---- Done ----
2020/08/19 18:37:14 Schema updated from 0.1 to 0.2, elapsed 266.3998ms
2020/08/19 18:37:14 ---- Executing updates for version 0.3 ----
2020/08/19 18:37:14 CREATE TABLE closed_executions_v2 (domain_id uuid,domain_partition int,workflow_id text,run_id uuid,start_time timestamp,close_time timestamp,status int, workflow_type_name text,history_length bigint,PRIMARY KEY ((domain_id, domain_partition), close_time, run_id)) WITH CLUSTERING ORDER BY (close_time DESC)AND COMPACTION = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}AND GC_GRACE_SECONDS = 172800;
2020/08/19 18:37:15 CREATE INDEX closed_by_workflow_id_v2 ON closed_executions_v2 (workflow_id);
2020/08/19 18:37:15 CREATE INDEX closed_by_close_time_v2 ON closed_executions_v2 (close_time);
2020/08/19 18:37:15 CREATE INDEX closed_by_type_v2 ON closed_executions_v2 (workflow_type_name);
2020/08/19 18:37:16 CREATE INDEX closed_by_status_v2 ON closed_executions_v2 (status);
2020/08/19 18:37:16 ---- Done ----
2020/08/19 18:37:16 Schema updated from 0.2 to 0.3, elapsed 1.5885359s
2020/08/19 18:37:16 ---- Executing updates for version 0.4 ----
2020/08/19 18:37:16 ALTER TABLE open_executions ADD execution_time timestamp;
2020/08/19 18:37:16 ALTER TABLE closed_executions ADD execution_time timestamp;
2020/08/19 18:37:17 ALTER TABLE closed_executions_v2 ADD execution_time timestamp;
2020/08/19 18:37:17 ALTER TABLE open_executions ADD memo blob;
2020/08/19 18:37:17 ALTER TABLE closed_executions ADD memo blob;
2020/08/19 18:37:18 ALTER TABLE closed_executions_v2 ADD memo blob;
2020/08/19 18:37:18 ALTER TABLE open_executions ADD encoding text;
2020/08/19 18:37:18 ALTER TABLE closed_executions ADD encoding text;
2020/08/19 18:37:18 ALTER TABLE closed_executions_v2 ADD encoding text;
2020/08/19 18:37:19 ---- Done ----
2020/08/19 18:37:19 Schema updated from 0.3 to 0.4, elapsed 2.8258145s
2020/08/19 18:37:19 ---- Executing updates for version 0.5 ----
2020/08/19 18:37:19 ALTER TABLE open_executions ADD task_list text;
2020/08/19 18:37:19 ALTER TABLE closed_executions ADD task_list text;
2020/08/19 18:37:19 ALTER TABLE closed_executions_v2 ADD task_list text;
2020/08/19 18:37:20 ---- Done ----
2020/08/19 18:37:20 Schema updated from 0.4 to 0.5, elapsed 975.0746ms
2020/08/19 18:37:20 All schema changes completed in 8.0205762s
2020/08/19 18:37:20 UpdateSchemeTask done