matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.79k stars 276 forks source link

[Bug]: the performance of point select on pk will have halved over times. #15116

Closed gouhongshen closed 3 weeks ago

gouhongshen commented 8 months ago

Is there an existing issue for the same bug?

Branch Name

main, 1.1-dev

Commit ID

main eb6d613581d013c3bdbd9503bf82ccf9de254ac7

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

#

Expected Behavior

point select on pk of a small/empty table benchmarking on the 129-machine with 256 terminals.

the first minutes:

image image

|


and the performance decreased:

image image

|

why has the performance decreased? we can see along with the performance halved, the running threads also stay very low.

image

|

comparing the traces of them, found that the latter blocked on a chan:

image

that's why the running threads decreased,

the source code is located:

const defaultQueueSize = 1310720 // queue mem cost = 10MB

//...

awakeCollect:   make(chan batchpipe.HasName, defaultQueueSize),

//...

// Collect item in chan, if collector is stopped then return error
func (c *MOCollector) Collect(ctx context.Context, item batchpipe.HasName) error {
    select {
    case <-c.stopCh:
        ctx = errutil.ContextWithNoReport(ctx, true)
        return moerr.NewInternalError(ctx, "MOCollector stopped")
    case c.awakeCollect <- item:
        return nil
    }
}

the tps can last longer by augmenting the defaultQueueSize.

Steps to Reproduce

mo-load

`./start.sh -n 1 -s 100000 -m SYSBENCH -g -b sbtest`

`./start.sh -c cases/sysbench/point_select_1_10W_prepare/ -d 15 -t 256 -g -b sbtest`

Additional information

No response

xzxiong commented 7 months ago

129: 64c128g 256 connection qps: 9w -> 4.5w

xzxiong commented 7 months ago

No plan yet

xzxiong commented 7 months ago

No plan yet

xzxiong commented 7 months ago

No plan yet

xzxiong commented 7 months ago

No plan yet

xzxiong commented 7 months ago

No plan yet

xzxiong commented 7 months ago

No plan yet

xzxiong commented 6 months ago

No plan yet

xzxiong commented 6 months ago

Not plan yet

xzxiong commented 6 months ago

Not plan yet

xzxiong commented 6 months ago

Not plan yet

xzxiong commented 6 months ago

Not plan yet

xzxiong commented 6 months ago

Not plan yet

xzxiong commented 6 months ago

Not plan yet

xzxiong commented 5 months ago

Not plan yet

xzxiong commented 5 months ago

Not plan yet

xzxiong commented 5 months ago

Not plan yet

xzxiong commented 5 months ago

Not plan yet

xzxiong commented 5 months ago

Not plan yet

xzxiong commented 5 months ago

Not plan yet

xzxiong commented 5 months ago

Not plan yet

xzxiong commented 4 months ago

Not plan yet

xzxiong commented 4 months ago

Not plan yet

xzxiong commented 4 months ago

Not plan yet

xzxiong commented 4 months ago

Not plan yet

xzxiong commented 4 months ago

Not plan yet

xzxiong commented 4 months ago

Not plan yet

xzxiong commented 4 months ago

Not plan yet

xzxiong commented 3 months ago

need regression test, after https://github.com/matrixorigin/matrixone/pull/17863 merged.

gouhongshen commented 3 months ago
replace:
- name: id
  type: random
  range: 1,100000
- name: i_id
  type: sequence
  start: 100001
  step: 1
- name: tbx
  type: random
  range: 1,1

point_select_10_100000_prepare/replace.yaml 内容换成上面的就行

xzxiong commented 3 months ago

regression env:

case:

对比版本:

  1. base: commit id: dcb07ac9e (1.3.0 迭代)
  2. refactor: commit id: bca6edc16

test result:

  1. refactor 平均tps: 30393 >> base 平均tps: 21452
  2. refactor version 更稳定,基本在 29992 ~ 31444 之间波动
  3. base version 性能更差,波动更大,测试开始1min, tps 区间29000 ~ 31421, 测试后续 10min,tps 区间 17487~21736 波动

result.15116.tgz

- [point_select_1_10W_prepare]
- START : 2024-08-06 16:08:12
- END : 2024-08-06 16:23:24
- VUSER : 256
- TPS : 21452
- QPS : 21452
- SUCCESS : 19330279
- ERROR : 0
- RT_MAX : 1759
- RT_MIN : 0
- RT_AVG : 11.93
- SUC_RATE : 1.0
- EXP_RATE : 1.0
- RESULT : SUCCEED
+ [point_select_1_10W_prepare]
+ START : 2024-08-06 16:31:57
+ END : 2024-08-06 16:47:09
+ VUSER : 256
+ TPS : 30393
+ QPS : 30393
+ SUCCESS : 27358263
+ ERROR : 0
+ RT_MAX : 275
+ RT_MIN : 0
+ RT_AVG : 8.42
+ SUC_RATE : 1.0
+ EXP_RATE : 1.0
+ RESULT : SUCCEED
gouhongshen commented 3 months ago

后续在 129 上面测一下

aressu1985 commented 3 weeks ago

FIXED