midenok / mariadb

MariaDB server is a community developed fork of MySQL server. Started by core members of the original MySQL team, MariaDB actively works with outside developers to deliver the most featureful, stable, and sanely licensed open SQL server in the industry.
GNU General Public License v2.0
0 stars 0 forks source link

MDEV-25546 LIMIT partition temporarily populated in transaction is not used anymore after ROLLBACK #104

Open midenok opened 2 years ago

midenok commented 2 years ago

Reproduce

--source include/have_partition.inc
--source include/have_sequence.inc
--source include/have_innodb.inc

create or replace table t1 (pk int primary key) engine=InnoDB with system versioning
  partition by system_time limit 100 (
  partition p0 history,
  partition p1 history,
  partition pn current
);
insert into t1 select seq from seq_1_to_90;

start transaction;

# Puts 80 rows into p0
replace into t1 select seq from seq_1_to_80;

# Puts another 70 rows into p0
replace into t1 select seq from seq_1_to_70;

# Puts 60 rows into p1
replace into t1 select seq from seq_1_to_60;

select partition_name, table_rows from information_schema.partitions where table_name = 't1';

rollback;

select partition_name, table_rows from information_schema.partitions where table_name = 't1';

# Should put 10 rows into the empty partition p0,
# but instead they go to p1
replace into t1 select seq from seq_1_to_10;

select partition_name, table_rows from information_schema.partitions where table_name = 't1';

# Cleanup
drop table t1;

Result

rollback;
select partition_name, table_rows from information_schema.partitions where table_name = 't1';
partition_name  table_rows
p0      0
p1      0
pn      90
replace into t1 select seq from seq_1_to_10;
select partition_name, table_rows from information_schema.partitions where table_name = 't1';
partition_name  table_rows
p0      0
p1      10
pn      90

Expected

Partition p0 is filled.

midenok commented 2 years ago

Fix candidate 1

vers_info->hist_part must be reverted on rollback for each table that was changed by transaction. vers_info->hist_part is initalized as first history partition in check_vers_constants() and then recalculated in vers_set_hist_part(). vers_info is stored in TABLE object (in part_info).

Processing each TABLE on rollback is not possible without help of storage engine: ha_rollback_trans() can access only handlertons and the more detailed data about what was changed by transaction is known only by storage engine.

InnoDB stores knowledge about what tables were modified into trx->mod_tables. This data is processed on partial rollback for trx-versioning needs (0b89a42ffc7). The structure is map with key dict_table_t *.

Proposed InnoDB changes:

  1. trx_rollback_to_savepoint_low() calls mod_tables.rollback() not only for partial rollback but for full rollback as well (savept == NULL).
  2. mod_tables.rollback() accesses TABLE object and calls TABLE::rollback().
  3. TABLE::rollback() updates vers_info->hist_part.

Changes for p. 2:

  1. Generally there is no stored connection between dict_table_t and TABLE. The connection is done only by innodb_find_table_for_vc() for vcol needs.

handler::ha_open() has TABLE object, handler::ha_open() calls without handler::open() without TABLE object. That is virtual interface implemented by ha_innobase::open(). The goal is to supply TABLE to ha_innobase::open() and assign it to dict_table_t. That can be done like that in handler::ha_open():

  if (unlikely((error=open2(table_arg, name,mode,test_if_locked))))
  {
     ...
  }

  if (error == HA_NOT_IMPLEMENTED && 
      unlikely((error=open(name,mode,test_if_locked))))
  {
    if ((error == EACCES || error == EROFS) && mode == O_RDWR &&
    (table->db_stat & HA_TRY_READ_ONLY))
    {
      table->db_stat|=HA_READ_ONLY;
      error=open(name,O_RDONLY,test_if_locked);
    }
  }

handler::ha_open() first tries to call open2() which by default returns HA_NOT_IMPLEMENTED. ha_innobase has this interface: we rename ha_innobase::open() to ha_innobase::open2() and pass table_arg argument.

  1. innodb_find_table_for_vc() now is not needed. We remove that function and table->vc_templ->mysql_table member. As far as I'm concerned this condition holds all tests except one below case:
    
    --- a/storage/innobase/handler/ha_innodb.cc
    +++ b/storage/innobase/handler/ha_innodb.cc
    @@ -20951,6 +20951,8 @@ static TABLE* innodb_find_table_for_vc(THD* thd, dict_table_t* table)
        TABLE* mysql_table = find_fk_open_table(thd, db_buf, db_buf_len,
                                                tbl_buf, tbl_buf_len);
midenok commented 2 years ago

Fix candidate 2

vers_info->hist_part must be reverted on rollback for each table that was changed by transaction. vers_info->hist_part is initalized as first history partition in check_vers_constants() and then recalculated in vers_set_hist_part(). vers_info is stored in TABLE object (in part_info).

Processing each TABLE on rollback is not possible without help of storage engine: ha_rollback_trans() can access only handlertons and the more detailed data about what was changed by transaction is known only by storage engine.

InnoDB stores knowledge about what tables were modified into trx->mod_tables. This data is processed on partial rollback for trx-versioning needs (0b89a42ffc7). The structure is map with key dict_table_t *.

Proposed changes:

  1. trx_rollback_to_savepoint_low() calls mod_tables.rollback() not only for partial rollback but for full rollback as well (savept == NULL).
  2. mod_tables.rollback() accesses TABLE_SHARE object and calls TABLE_SHARE::rollback().
  3. TABLE_SHARE::rollback() sets rolled_back property to true.
  4. When vers_set_hist_part() sees rolled_back == true it reinits hist_part from first history partition (and then recalculates).
  5. On tdc_release_share() when share->tdc->ref_count == 0 TABLE_SHARE method is called share->released().
  6. TABLE_SHARE::released() sets rolled_back property to false.

PROGRESS HERE

Changes for p. 2:

  1. Generally there is no stored connection between dict_table_t and TABLE. The connection is done only by innodb_find_table_for_vc() for vcol needs.

handler::ha_open() has TABLE object, handler::ha_open() calls without handler::open() without TABLE object. That is virtual interface implemented by ha_innobase::open(). The goal is to supply TABLE to ha_innobase::open() and assign it to dict_table_t. That can be done like that in handler::ha_open():

  if (unlikely((error=open2(table_arg, name,mode,test_if_locked))))
  {
     ...
  }

  if (error == HA_NOT_IMPLEMENTED && 
      unlikely((error=open(name,mode,test_if_locked))))
  {
    if ((error == EACCES || error == EROFS) && mode == O_RDWR &&
    (table->db_stat & HA_TRY_READ_ONLY))
    {
      table->db_stat|=HA_READ_ONLY;
      error=open(name,O_RDONLY,test_if_locked);
    }
  }

handler::ha_open() first tries to call open2() which by default returns HA_NOT_IMPLEMENTED. ha_innobase has this interface: we rename ha_innobase::open() to ha_innobase::open2() and pass table_arg argument.

  1. innodb_find_table_for_vc() now is not needed. We remove that function and table->vc_templ->mysql_table member. As far as I'm concerned this condition holds all tests except one below case:
    
    --- a/storage/innobase/handler/ha_innodb.cc
    +++ b/storage/innobase/handler/ha_innodb.cc
    @@ -20951,6 +20951,8 @@ static TABLE* innodb_find_table_for_vc(THD* thd, dict_table_t* table)
        TABLE* mysql_table = find_fk_open_table(thd, db_buf, db_buf_len,
                                                tbl_buf, tbl_buf_len);