hanchuanchuan / goInception

一个集审核、执行、备份及生成回滚语句于一身的MySQL运维工具
https://hanchuanchuan.github.io/goInception/
GNU General Public License v3.0
1.62k stars 553 forks source link

ghost_cut_over_lock_timeout_seconds不起作用 #565

Open yuexiaofengqing opened 1 year ago

yuexiaofengqing commented 1 year ago

描述 在我使用goinception执行ddl的时候,使用gh-ost模式,在最后的cute over切换时goinception设置的lock table超时限制为4s,配置文件中设置的是2s。

重现 执行任何DDL即可

环境

参数 我的配置文件内容: [root@dba scripts]# cat /etc/goinception.config.toml

GoIncepiton Configuration.

IP地址

host = "0.0.0.0"

端口

port = 4000

TiDB数据库目录

path = "/goinception/tidb"

忽略终端连接断开信号

ignore_sighup = true

[log]

日志级别: debug, info, warn, error, fatal.

level = "info"

日志格式, one of json, text, console.

format = "text"

禁用时间戳输出

disable-timestamp = false

日志文件

[log.file]

日志文件名

filename = "/goinception/logs/goinception.log"

日志文件的最大上限(MB)

max-size = 300

Max日志文件的保存天数,默认值 0,即不清理

max-days = 15

要保留的最大旧日志文件数,默认值 0,即不清理

max-backups = 10

日志轮询,默认值 true,即开启

log-rotate = true

[inc] backup_host = "192.168.4.208" backup_port = 3306 backup_user = "root" backup_password = "Abc12345!"

enable_alter_database = false enable_zero_date = true enable_nullable = true enable_drop_table = false enable_set_engine = true enable_timestamp_type=true enable_change_column = true enable_sql_statistic = true enable_blob_type = false enable_column_charset = false enable_enum_set_bit=false enable_foreign_key=false enable_orderby_rand=false enable_partition_table=false enable_select_star=false enable_set_charset=true enable_set_collation=false

enable_autoincrement_unsigned = true check_autoincrement_datatype = true check_autoincrement_init_value = true check_autoincrement_name = true

check_timestamp_default = false check_timestamp_count = true check_table_comment = false check_column_comment = false check_dml_limit = true check_dml_orderby = true check_dml_where = true check_insert_field = true check_primary_key = true

check_float_double = false check_identifier = false check_identifier_upper = false check_identifier_lower = false check_read_only = false check_column_default_value = false

建表时必须存在的列

must_have_columns="createTime,updateTime"

是否检查字段��在多个索引,比如索引 (a,b) 与索引 (a) 中,a字段就是被重复索引的,应当只保留索引 (a,b)

check_index_column_repeat = true

是否允许使用关键字,包括MYSQL关键字与自定义关键字(配置项:custom_keywords)

enable_identifer_keyword = false

额外的自定义关键字

custom_keywords = []

审核列类型变更

check_column_type_change = true

表名/索引名前缀

check_index_prefix = false index_prefix = "" uniq_index_prefix = "" table_prefix = ""

explain判断受影响行数时使用的规则, 默认值"first"

可选值: "first", "max"

"first": 使用第一行的explain结果作为受影响行数

"max": 使用explain结果中的最大值作为受影响行数

explain_rule = "first"

sql_mode, 默认值""

可选值: "", "STRICT_TRANS_TABLES", "STRICT_ALL_TABLES", "TRADITIONAL"

"": 默认使用目标 MySQL 实例 sql_mode

"STRICT_TRANS_TABLES": 为事���性存储引擎以及可能的情况下为非事务性存储引擎启用严格的SQL模式

"STRICT_ALL_TABLES": 为所有存储引擎启用严格的SQL模式

"TRADITIONAL": 严格的SQL组合模式, 相当于STRICT_TRANS_TABLES, STRICT_ALL_TABLES, NO_ZERO_IN_DATE, NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO, NO_AUTO_CREATE_USER,和 NO_ENGINE_SUBSTITUTION

sql_mode = ""

安全更新是否开启(mysql自身的功能).

-1 表示不做操作,基于远端数据库 [默认值]

0 表示关闭安全更新

1 表示开启安全更新

sql_safe_updates = -1

设置执行SQL时,会话变量

-1 表示不做操作,基于远端数据库【默认值】

> 0 值表示,会话在执行SQL 时获取锁超时的时间

lock_wait_timeout = -1

是否跳过goinception用户权限校验, 默认跳过

skip_grant_table = true

support_charset = "utf8mb4" support_engine = "innodb"

lang = "en-US"

全量日志

general_log = true

default_charset = "utf8mb4" max_allowed_packet = 16777216 max_char_length = 16 max_key_parts=5 max_keys=16 max_update_rows=0 merge_alter_table = true disable_types = "blob,text,enum,set,bit"

[osc]

用来设置在arkit返回结果集中,对于原来OSC在执行过程的标准输出信息是不是要打印到结果集对应的错误信息列中,

如果设置为1,就不打印,如果设置为0,就打印。而如果出现了错误,则都会打印。默认值:OFF

osc_print_none = false

对应参数pt-online-schema-change中的参数--print。默认值:OFF

osc_print_sql = false

全局的OSC开关。默认值:OFF

osc_on = false

这个参数实际上是一个OSC开关,如果设置为0,则全部ALTER语句都使用OSC方式,

如果设置为非0,则当这个表占用空间大小大于这个值时才使用OSC方式。

单位为M,这个表大小的计算方式是通过语句

select (DATA_LENGTH + INDEX_LENGTH)/1024/1024 from information_schema.tables

where table_schema = 'dbname' and table_name = 'tablename' 来实现的。默认值:16

[0-1048576]

osc_min_table_size = 16

对应参数pt-online-schema-change中的参数alter-foreign-keys-method,具体意义可以参考OSC官方手册。默认值:none

[auto | none | rebuild_constraints | drop_swap]

osc_alter_foreign_keys_method = "none"

对应参数pt-online-schema-change中的参数recursion_method,具体意义可以参考OSC官方手册。默认值:processlist

[processlist | hosts | none]

osc_recursion_method = "processlist"

对应参数pt-online-schema-change中的参数--max-lag。默认值:3

osc_max_lag = 3

类似--max-lag,检查集群暂停流量控制所花费的平均时间(仅适用于PXC 5.6及以上版本,会自动检测)

osc_max_flow_ctl = -1

对应参数pt-online-schema-change中的参数 --sleep 默认值:0.0

osc_sleep = 0.0

对应参数pt-online-schema-change中的参数 --set-vars lock_wait_timeout=?

osc_lock_wait_timeout = 60

对���参数pt-online-schema-change中的参数--[no]check-alter。默认值:ON

osc_check_alter = true

对应参数pt-online-schema-change中的参数--[no]check-replication-filters。默认值:ON

osc_check_replication_filters = true

检查是否为添加唯一索引,如果是,则禁止

osc_check_unique_key_change = true

对应参数pt-online-schema-change中的参数--[no]drop-old-table。默认值:ON

osc_drop_old_table = true

对应参数pt-online-schema-change中的参数--[no]drop-new-table。默认值:ON

osc_drop_new_table = true

对应参数pt-online-schema-change中的参数--max-load中的thread_running部分。默认值:80

osc_max_thread_running = 50

对应参数pt-online-schema-change中的参数--max-load中的thread_connected部分。默认值:1000

osc_max_thread_connected = 1000

对应参数pt-online-schema-change中的参数--critical-load中的thread_running部分。默认值:80

osc_critical_thread_running = 200

对应参数pt-online-schema-change中的参数--critical-load中的thread_connected部分。默认值:1000

osc_critical_thread_connected = 1000

对应参数pt-online-schema-change中的参数--chunk-time。默认值:1

osc_chunk_time = 1.0

对应参数pt-online-schema-change中的参数--chunk-size-limit。默认值:4

osc_chunk_size_limit = 4

对应参数pt-online-schema-change中的参数--chunk-size。默认值:1000

osc_chunk_size = 1000

对应参数pt-online-schema-change中的参数--check-interval,意义是Sleep time between checks for --max-lag。默认值:5

osc_check_interval = 5

osc_bin_dir = "/usr/bin"

[ghost] ghost_on = true ghost_allow_on_master = true ghost_assume_rbr = true ghost_chunk_size = 10000 ghost_nice_ratio = 1.0 ghost_concurrent_rowcount = true ghost_cut_over = "atomic" ghost_cut_over_lock_timeout_seconds = 2 ghost_default_retries = 60 ghost_heartbeat_interval_millis = 500 ghost_max_lag_millis = 300 ghost_approve_renamed_columns = true ghost_exponential_backoff_max_interval = 64 ghost_dml_batch_size = 50 ghost_ok_to_drop_table = true ghost_initially_drop_ghost_table = true ghost_initially_drop_socket_file = true ghost_skip_foreign_key_checks = true ghost_critical_load_interval_millis = 5000 ghost_exact_rowcount = true

[inc_level] er_alter_table_once = 1 er_auto_incr_id_warning = 1 er_autoinc_unsigned = 1 er_bad_field_error = 2 er_bad_null_error = 2 er_binlog_corrupted = 2 er_binlog_format_statement = 2 er_blob_cant_have_default = 1 er_blob_used_as_key = 2 er_cant_change_column = 1 er_cant_change_column_position = 1 er_cant_drop_database = 2 er_cant_drop_field_or_key = 2 er_cant_drop_table = 2 er_cant_remove_all_fields = 2 er_cant_set_charset = 1 er_cant_set_collation = 1 er_cant_set_engine = 1 er_change_column_type = 1 er_change_too_much_rows = 1 er_char_to_varchar_len = 1 er_charset_not_support = 2 er_charset_on_column = 1 er_collation_charset_mismatch = 2 er_collation_not_support = 2 er_column_existed = 2 er_column_have_no_comment = 1 er_column_not_existed = 2 er_columns_must_have_index = 1 er_columns_must_have_index_type_err = 1 er_con_count_error = 2 er_conflicting_declarations = 2 er_data_too_long = 2 er_db_not_existed_error = 2 er_ddl_dml_coexist = 2 er_dup_fieldname = 2 er_dup_index = 2 er_dup_keyname = 2 er_end_with_commit = 2 er_end_with_semicolon = 2 er_engine_not_support = 2 er_error_exist_before = 2 er_field_not_in_group_by = 2 er_field_specified_twice = 2 er_float_double_to_decimal = 2 er_forcing_close = 2 er_foreign_key = 2 er_have_begin = 2 er_hostname = 2 er_id_is_uper = 2 er_ident_use_keyword = 1 er_identifier_lower = 2 er_identifier_upper = 2 er_implicit_type_conversion = 1 er_inc_init_err = 1 er_inception_empty_query = 2 er_incorrect_datetime_value = 2 er_incorrect_global_local_var = 2 er_index_column_repeat = 2 er_index_name_idx_prefix = 1 er_index_name_uniq_prefix = 1 er_index_not_existed = 2 er_index_use_alter_table = 2 er_insert_too_much_rows = 1 er_invalid_backup_host_info = 2 er_invalid_command = 2 er_invalid_data_type = 1 er_invalid_default = 2 er_invalid_group_func_use = 2 er_invalid_ident = 1 er_invalid_on_update = 2 er_join_no_on_condition = 1 er_json_type_support = 2 er_key_column_does_not_exits = 2 er_mariadb_rollback_warn = 1 er_max_column_count = 1 er_max_varchar_length = 2 er_mix_of_group_func_and_fields = 2 er_multiple_pri_key = 2 er_must_at_least_one_column = 2 er_must_have_columns = 1 er_net_packets_out_of_order = 2 er_net_read_error = 2 er_net_read_interrupted = 2 er_network_read_event_checksum_failure = 2 er_no_db_error = 2 er_no_where_condition = 1 er_non_uniq_error = 2 er_non_uniq_table = 2 er_normal_shutdown = 2 er_not_allowed_nullable = 1 er_not_found_master_status = 2 er_not_found_table_info = 1 er_not_supported_alter_option = 1 er_not_supported_item_type = 2 er_not_supported_key_type = 2 er_not_supported_yet = 2 er_not_valid_password = 2 er_ordery_by_rand = 1 er_osc_kill_failed = 2 er_outofmemory = 1 er_parse_error = 2 er_partition_not_allowed = 1 er_partition_not_existed = 2 er_pk_cols_not_int = 1 er_pk_too_many_parts = 1 er_primary_cant_have_null = 2 er_remote_exe_error = 2 er_removed_spaces = 2 er_repeat_const_definition = 2 er_same_name_partition = 2 er_select_only_star = 1 er_set_data_type_int_bigint = 2 er_shutdown_complete = 2 er_slave_corrupt_event = 2 er_slave_relay_log_write_failure = 2 er_sql_invalid_op_type = 2 er_sql_invalid_source = 2 er_sql_no_op_type = 2 er_sql_no_source = 2 er_start_as_begin = 2 er_syntax_error = 2 er_table_cant_handle_auto_increment = 2 er_table_charset_must_null = 1 er_table_charset_must_utf8 = 1 er_table_collation_not_support = 1 er_table_exists_error = 2 er_table_must_have_comment = 1 er_table_must_have_pk = 1 er_table_not_existed_error = 2 er_table_prefix = 1 er_temp_table_tmp_prefix = 1 er_text_not_nullable_error = 1 er_timestamp_default = 1 er_too_long_bakdb_name = 2 er_too_long_ident = 2 er_too_long_index_comment = 1 er_too_long_key = 2 er_too_many_key_parts = 1 er_too_many_keys = 1 er_too_much_auto_timestamp_cols = 2 er_truncated_wrong_value = 2 er_udpate_too_much_rows = 1 er_unknown_character_set = 2 er_unknown_charset = 1 er_unknown_collation = 1 er_unknown_system_variable = 2 er_unknown_table = 2 er_use_enum = 1 er_use_index_visibility = 2 er_use_text_or_blob = 2 er_use_value_expr = 1 er_username = 2 er_varchar_to_text_len = 2 er_view_select_clause = 2 er_view_support = 2 er_with_default_add_column = 1 er_with_insert_field = 1 er_with_insert_values = 2 er_with_limit_condition = 1 er_with_orderby_condition = 1 er_wrong_and_expr = 1 er_wrong_arguments = 2 er_wrong_auto_key = 2 er_wrong_column_name = 2 er_wrong_db_name = 2 er_wrong_key_column = 2 er_wrong_name_for_index = 2 er_wrong_string_length = 2 er_wrong_sub_key = 2 er_wrong_table_name = 2 er_wrong_usage = 2 er_wrong_value_count_on_row = 2 er_wrong_value_for_var = 2

hanchuanchuan commented 1 year ago

可以通过查询goinception的环境变量来检查内存中的参数值 inc show variables like '%cut%';

yuexiaofengqing commented 1 year ago

可以通过查询goinception的环境变量来检查内存中的参数值 inc show variables like '%cut%';

我可以看到参数值设置为2 root@127.0.0.1[(none)]> inc show variables like '%cut%'; +-------------------------------------+--------+ | Variable_name | Value | +-------------------------------------+--------+ | max_execution_time | 0 | | ghost_cut_over | atomic | | ghost_cut_over_exponential_backoff | false | | ghost_cut_over_lock_timeout_seconds | 2 | | ghost_force_named_cut_over | false | | ghost_postpone_cut_over_flag_file | | +-------------------------------------+--------+ 6 rows in set (0.00 sec)

但是查看执行日志可以看到这个2s限制的是Setting RENAME timeout as 2 seconds 可是我在Setting LOCK timeout as 4 seconds这一步的获取锁时如何控制这个超时时间呢?

yuexiaofengqing commented 1 year ago

你好,今天使用goinception的gh-ost模式执行DDL语句的时候又失败了,目前看到日志里存在 time="2023/04/11 15:48:42.087" level=error msg="Timeout while waiting for events up to lock" file=migrator.go func=waitForEventsUpToLock line=622 time="2023/04/11 15:48:42.087" level=error msg="Timeout while waiting for events up to lock" file=migrator.go func=atomicCutOver line=703 这种报错导致无法完成切换,同时因为一直在获取锁,导致此表上的后续事务全部等待,严重影响到了线上服务,麻烦看下怎么解决,我从网上查的可能是因为并行调用gh-ost产生锁冲突的原因。

stale[bot] commented 1 year ago

由于此问题没有最近的活动,因此已被自动标记为陈旧。如果没有进一步的活动,会作为不活跃issue关闭。感谢你对本项目的贡献。 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.