iovisor / bcc

BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
Apache License 2.0
20.52k stars 3.88k forks source link

cachestat hitrate = 0 when test iozone seq re-write #3378

Open KenWUs opened 3 years ago

KenWUs commented 3 years ago

Hi all,

I use cachestat (enable debug flag) to test iozone O_SYNC write、re-write, After write mode finish and then test re-write mode , mark_page_accessed(mpa) 、add_to_page_cache_lru(apci) count are very low, Cause the formula calculation of hitratio、miss and hit count equal to 0 If test iozone O_SYNC+O_DIRECT or O_DIRECT, it can't happen this problem.

./iozone -i0 -r2048k -s128m -o -C -f test_file -w
$ ./cachestat 1
    HITS   MISSES  DIRTIES HITRATIO   BUFFERS_MB  CACHED_MB
2892 0 4 0 2892 4 2888 -> (mpa, mbd, apcl, apd, total, misses, hits)

    2888        4        0   99.86%           69        804
1356 1122 1101 1109 234 0 234

     234        0     1122  100.00%           69        684
6355 6303 6236 6236 52 0 52

      52        0     6303  100.00%           69        712
6265 6204 6144 6144 61 0 61

      61        0     6204  100.00%           69        738
6258 6204 6144 6144 54 0 54

      54        0     6204  100.00%           69        766
6259 6204 6144 6144 55 0 55

      55        0     6204  100.00%           69        793
2916 7414 2768 7382 0 0 0

       0        0     7414    0.00%           69        804
91 8236 4 8203 0 0 0 -> (mpa, mbd, apcl, apd, total, misses, hits)

       0        0     8236    0.00%           69        804
51 8287 0 8266 0 0 0

       0        0     8287    0.00%           69        804
51 8222 0 8200 0 0 0

       0        0     8222    0.00%           69        804
93 514 0 512 0 0 0

       0        0      514    0.00%           69        804
0 0 0 0 0 0 0

       0        0        0    0.00%           69        804

And then i check write、re-write behavior difference, write mode satisfy below judgement to sync revised data to inode , finally can access mark_page_accessed(mpa) lead to mpa counts too high, but re-write mode can't happen this action.

void __mark_inode_dirty(struct inode *inode, int flags){
--
[...]
    if (flags & (I_DIRTY_SYNC \| I_DIRTY_DATASYNC \| I_DIRTY_TIME)) {
        trace_writeback_dirty_inode_start(inode, flags);
 
        if (sb->s_op->dirty_inode)
            sb->s_op->dirty_inode(inode, flags);  // ext4_dirty_inode()
 
        trace_writeback_dirty_inode(inode, flags);
    }

ftrace

//write
iozone-17202 [000] .... 776834.790129: writeback_mark_inode_dirty: bdi 179:0: ino=15 state=I_DIRTY_SYNC\|I_DIRTY_DATASYNC\|I_DIRTY_PAGES flags=I_DIRTY_PAGES
iozone-17202 [000] .... 776834.790132: writeback_mark_inode_dirty: bdi 179:0: ino=15 state=I_DIRTY_SYNC\|I_DIRTY_DATASYNC\|I_DIRTY_PAGES flags=I_DIRTY_SYNC\|I_DIRTY_DATASYNC\|I_DIRTY_PAGES
iozone-17202 [000] .... 776834.790149: writeback_mark_inode_dirty: bdi 179:0: ino=15 state=I_DIRTY_SYNC\|I_DIRTY_DATASYNC\|I_DIRTY_PAGES flags=I_DIRTY_PAGES
iozone-17202 [000] .... 776834.790150: writeback_mark_inode_dirty: bdi 179:0: ino=15 state=I_DIRTY_SYNC\|I_DIRTY_DATASYNC\|I_DIRTY_PAGES flags=I_DIRTY_SYNC\|I_DIRTY_DATASYNC\|I_DIRTY_PAGES
//re-write
iozone-17202 [000] .... 776834.792761: writeback_mark_inode_dirty: bdi 179:0: ino=15 state=I_DIRTY_SYNC\|I_DIRTY_DATASYNC\|I_DIRTY_PAGES flags=I_DIRTY_PAGES
iozone-17202 [000] .... 776834.792770: writeback_mark_inode_dirty: bdi 179:0: ino=15 state=I_DIRTY_SYNC\|I_DIRTY_DATASYNC\|I_DIRTY_PAGES flags=I_DIRTY_PAGES
iozone-17202 [000] .... 776834.792779: writeback_mark_inode_dirty: bdi 179:0: ino=15 state=I_DIRTY_SYNC\|I_DIRTY_DATASYNC\|I_DIRTY_PAGES flags=I_DIRTY_PAGES

add_to_page_cache_lru(apci) load new page to cache in write mode , but in re-mode case it can't access this function because already find page in page cache, the apci almost equal to 0.

So I have question how to make re-write mode hitratio have reference value when dirty page exist in page cache and not sync to inode, it can compute page cache access situation as much as possible, not make formula calculation happen strange exception (total=miss=hit=0).

yonghong-song commented 3 years ago

I am not a kernel page cache expert. @brendangregg What do you think the issue @KenWUs brought up here? Do we need to bring here some page cache expert?