lihongjie0209 / myblog

4 stars 0 forks source link

How InnoDB handles REDO logging #168

Open lihongjie0209 opened 4 years ago

lihongjie0209 commented 4 years ago

Since InnoDB tries to keep the working set in memory (InnoDB Buffer Pool), therefore the changes made by transactions will occur in volatile memory and later be flushed to disk. So in the event of volatile memory failure or during a system restart InnoDB can guarantee to have a (C)onsistent record of the state of the data in the database in (D)urable memory and that each transaction is (A)tomic.

InnoDB把所有的数据都放在内存中, 如果我们编辑内存中的数据, 但是数据没有被写入到磁盘, 那么数据就会丢失.

但是我们也没有办法每次编辑内存页就同步写磁盘, 这样磁盘的压力非常大.

所以每次编辑内存页都会记录一个编辑日志到 redolog 中, redolog 支持快速落盘(顺序写入), 而内存页可以在合适的时候批量落盘.

lihongjie0209 commented 4 years ago

How does this affect response time?

In order to keep ACID compliance, the transaction log must guarantee the logging action happens before the transaction is committed, this is known as write-ahead-logging. This essentially means that before an update can return it must be logged. As the time to log is added to every update it can become an important overhead to your response time. Indeed if InnoDB cannot log at all, your transaction will never complete.

Both the MySQL configuration and hardware will affect the speed at which InnoDB can log. You can request that InnoDB force its REDO records to the transaction log but not actually flush the log to disk at commit. By this means you are ensuring that in the event of a MySQL crash the data will still be consistent, however in the event of a system crash there may be some records that never make it to durable memory.

The three options available are:

innodb_flush_log_at_trx_commit = 0 /* Write log buffer to log about once a second and flush */

innodb_flush_log_at_trx_commit = 1 /* Write log buffer to log and flush to disk */

innodb_flush_log_at_trx_commit = 2 /* Write log buffer to log and flush about once per second */

(About once a second is related to the fact the flush and sync is controlled by a background thread that gives no guarantee it will be per second)

NB: Percona Server with XtraDB actually allow you to control this at session level.

This allows you to risk durability by having the logging return before the record is actually on disk. A similar and complementary mechanism is to use a battery backed raid card. This allows updates to be written to the raid cards battery protected memory which has much faster access than the underlying disk(s).

虽然redolog 落盘很快, 但是我们还是可以减少redolog的落盘次数来达到更高的性能

lihongjie0209 commented 4 years ago

How does this affect throughput?

As the REDO log in InnoDB uses a fixed length circular transaction log, the speed at which you can process updates is tightly linked to the speed at which check-pointing can occur. In order to insert a new record in the transaction log, InnoDB must ensure that the previously written record has been flushed to disk . This means that it can be beneficial to have very large transaction logs which allow a larger window between REDO logging and the checkpoint on disk.

Vadim touched upon this subject recently in his post MySQL 5.5.8 – in search of stability, as seen in his post, Percona Server with XtraDB allows much larger logs beyond the InnoDB limit of 4G.

As a side effect of logging there is no need to apply page changes at commit, therefore what would be a random write is translated into a much faster sequential write. The background processing of dirty pages can then apply algorithms to optimize those writes during checkpoint.