opensource4you / astraea

釋放kafka的無限潛能
Apache License 2.0
148 stars 59 forks source link

Astraea 測試用硬體 #130

Open chinghongfang opened 3 years ago

chinghongfang commented 3 years ago

硬體規格:

Consumer

NETGEAR <-10Gibit-> (01\~06) NETGEAR <-10Gibit-> (07\~12) TP-LINK <-10Gibit-> (15\~20)


測試結果:

iperf進行網路測試 (舊 switch)

兩兩依序單向打資料: (i->j)

[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  11.0 GBytes  9.41 Gbits/sec

所有排列組合都是11.0GBytest Transfer, 9.41Gbits/sec bandwidth。

五台同時打資料: (1->2->3->4->5->1)

[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  10.9 GBytes  9.39 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  10.9 GBytes  9.39 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  10.9 GBytes  9.40 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  10.9 GBytes  9.39 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  10.9 GBytes  9.39 Gbits/sec

iperf進行網路測試 (新 switch)

兩兩依序單向打資料: (i->j)

[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  11.0 GBytes  9.41 Gbits/sec

所有排列組合都是11.0GBytest Transfer, 9.41Gbits/sec bandwidth。

五台同時打資料: (1->2->3->4->5->1)

[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  10.9 GBytes  9.39 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  10.9 GBytes  9.40 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  10.9 GBytes  9.39 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  10.9 GBytes  9.39 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  10.9 GBytes  9.40 Gbits/sec

Astraea performance tool

配置: 1: zookeeper, performance tool 2: kafka broker 3: kafka broker 4: kafka broker 5: kafka broker record.size=1024, producers 5, consumers 10, partitions 100

Time: 0hr 0min 9sec                                 
producers完成度: 88.44%                        
  輸出959.636MB/second                         
  發送max latency: 1056ms                      
  發送mim latency: 0ms                         
  producer[0]的發送average bytes: 199.542MB    
  producer[0]的發送average latency: 106.352ms       
  producer[1]的發送average bytes: 189.162MB         
  producer[1]的發送average latency: 117.860ms       
  producer[2]的發送average bytes: 202.896MB         
  producer[2]的發送average latency: 114.839ms  
  producer[3]的發送average bytes: 179.086MB    
  producer[3]的發送average latency: 127.801ms  
  producer[4]的發送average bytes: 188.950MB    
  producer[4]的發送average latency: 134.676ms  

consumer完成度: 84.49%                              
  輸入915.875MB/second                         
  端到端max latency: 1680ms                                                                                                                                                                                        
  端到端mim latency: 0ms
  consumer[0]的端到端average bytes: 88.097MB   
  consumer[0]的端到端average latency: 292.221ms
  consumer[1]的端到端average bytes: 92.934MB   
  consumer[1]的端到端average latency: 233.351ms     
  consumer[2]的端到端average bytes: 93.817MB        
  consumer[2]的端到端average latency: 174.878ms     
  consumer[3]的端到端average bytes: 88.581MB   
  consumer[3]的端到端average latency: 288.433ms
  consumer[4]的端到端average bytes: 90.822MB   
  consumer[4]的端到端average latency: 278.449ms
  consumer[5]的端到端average bytes: 94.022MB   
  consumer[5]的端到端average latency: 191.517ms     
  consumer[6]的端到端average bytes: 89.723MB        
  consumer[6]的端到端average latency: 281.531ms     
  consumer[7]的端到端average bytes: 95.404MB        
  consumer[7]的端到端average latency: 152.815ms
  consumer[8]的端到端average bytes: 88.826MB   
  consumer[8]的端到端average latency: 272.659ms
  consumer[9]的端到端average bytes: 93.649MB   
  consumer[9]的端到端average latency: 173.993ms
chia7712 commented 3 years ago

所有排列組合都是11.0GBytest Transfer, 9.41Gbits/sec bandwidth。

這個數字看起來不錯,接下來可否跑kafka performance?

chia7712 commented 3 years ago

另外硬體規格可否寫完整一點?可以直接抄廠商給的規格單就好,謝謝

wycccccc commented 2 years ago

@chia7712 以下是我對2~4號機器的硬碟測試的部分截圖。 使用了dd進行了寫資料的測試,分別對兩張SSD進行測試,寫入資料大小每筆10KiB 整個測試過程沒有出現速率下降的情況

SSD1 Screenshot from 2022-02-10 18-06-23 SSD2 Screenshot from 2022-02-10 18-12-40

chia7712 commented 2 years ago

整個測試過程沒有出現速率下降的情況

可否試試看其他工具?多比較一下

chia7712 commented 2 years ago

看了一下這篇 https://www.pcworld.com/article/399100/crucial-p2-nmve-ssd-review.html/amp

這顆在大量資料連續寫入(文中以48GB為示範)下似乎不太行,現在可能有兩個方向可以嘗試:

  1. performance tool都以有壓縮的前提下進行測試,例如用gzip or lz4,一來這符合實際使用、二來也分一些時間到CPU身上,這兩個調整並不會影響我們比較partitioner的部分,因此可以作為解決方案
  2. 下一批採購時,提換成適合大量寫入的硬碟,這批硬碟則可以作為其他(例如balacner)的測試硬碟

@wycccccc 你覺得呢?

chia7712 commented 2 years ago

不過你測試大量寫入卻沒遇到問題,或許矛頭又指向kafka在io上的行為不適合這顆硬碟?

這可能要追蹤一下

garyparrot commented 2 years ago

使用了dd進行了寫資料的測試

可否試試看其他工具?多比較一下

要不要用用看 fio? 我記得用 dd 做 Disk 效能測試好像不太好

https://blog.cloud-mercato.com/dd-is-not-a-benchmarking-tool/

TL;DR:

  1. 簡單形式的 dd 命令可能會被 file system cache 隱藏 write pressure
  2. /dev/zero, /dev/random 這些常用的 input 沒辦法體現 IO Workload(?)
  3. 隱藏很多因素, 如 IOPS, Write Pattern ...
chia7712 commented 2 years ago

@garyparrot 你有空協助測試嗎?現在的首要目標是確定除了Kafka的寫入行為以外,是否還有其他方式可以造成寫入速度劇降

wycccccc commented 2 years ago

看了一下這篇 https://www.pcworld.com/article/399100/crucial-p2-nmve-ssd-review.html/amp

這篇文章中提到的狀況確實與現在遇到的十分類似

我再去試一下fio看看,可能是測試工具不太行沒有測試出真實的情況。等測試結束再下結論。 如果我工具用的不太順再找學弟幫忙。

wycccccc commented 2 years ago

測試使用fio進行 測試針對寫入情況, 採用Buffered IO, record.size10KB, Thread 5

可以斷定確實是硬碟問題(kafka獨特的效能問題沒有了),從第一張圖中可以看出,中間的機器速率歸為了0。整個測試過程中,disk的寫入速率上下劇烈跳動,並多次達到0.與kafka遇到的情況十分相像。

2022-02-11 21-19-58 的螢幕擷圖 2022-02-11 21-23-43 的螢幕擷圖

performance tool都以有壓縮的前提下進行測試,例如用gzip or lz4,一來這符合實際使用、二來也分一些時間到CPU身上,這兩個調整並不會影響我們比較partitioner的部分,因此可以作為解決方案

在之前有做過一些壓縮方面的測試,似乎沒有受到這方面的影響,我認為可以先繼續在這方面測試。 另外kafka的record.size設為100KiB的時候,該問題也基本沒有影響。我認為改變會產生問題的資料大小(10KiB)也可以繼續進行測試。

chia7712 commented 2 years ago

另外kafka的record.size設為100KiB的時候,該問題也基本沒有影響。我認為改變會產生問題的資料大小(10KiB)也可以繼續進行測試。

這個調整會導致每秒寫入的資料量下降嗎?如果每秒寫入的資料量沒有下降的話,那為何可以改善@@

wycccccc commented 2 years ago

從performance tool上觀察沒有下降。外加1Kib的record.size 情況也還比較不錯,偶爾會出現該問題(大概一次測試20min內會出現一到兩次短暫的歸零)。(只經過少量測試) 這是個好問題,我也一直十分好奇。我會繼續測試順便思考一下該問題,如果找到了答案再分享。

chia7712 commented 2 years ago

@wycccccc 你可以分享多一點fio的測試方式嗎?我這邊double-check一下

wycccccc commented 2 years ago

這是我的測試指令 fio -filename=/ssd1/test1 -iodepth 1 -thread -rw=write -ioengine=psync -bs=10K -size=500G -numjobs=5 -runtime=180 -group_reporting -name=1write_10k

這篇裡面有一些不同的測試方式 https://www.alibabacloud.com/blog/how-to-use-fio-to-test-the-io-performance-of-ecs-local-ssd-and-essd-part-1_597783 參數上的一些細節概念我參考了這篇 https://blog.jaycetyle.com/2021/10/linux-fio-tips/

chia7712 commented 2 years ago

可以斷定確實是硬碟問題(kafka獨特的效能問題沒有了),從第一張圖中可以看出,中間的機器速率歸為了0。整個測試過程中,disk的寫入速率上下劇烈跳動,並多次達到0.與kafka遇到的情況十分相像。

六個平行跑都針對同一個檔案,這是故意的嗎?如果針對不同檔案也會出現歸零的狀況嗎

wycccccc commented 2 years ago

六個平行跑都針對同一個檔案,這是故意的嗎?如果針對不同檔案也會出現歸零的狀況嗎

抱歉我沒有講清楚,每張圖中從左到右分別是234號機器,檔案路徑一樣但其實是三台各自的機器。 兩張圖都是各自的測試並不是同時進行,只是我想說明在不同的測試中也會歸零。(第二張圖,最右邊的4號機器歸零)

chia7712 commented 2 years ago

兩張圖都是各自的測試並不是同時進行,只是我想說明在不同的測試中也會歸零。(第二張圖,最右邊的4號機器歸零)

我剛剛試了一下,會得到一樣的結果

細節可以有空再追,現在看來塞太快會撞到這顆爛硬碟的上限,我們先預設啟用壓縮(gzip or lz4)繼續跑測試,然後負責送資料那台先不要啟用broker,以免變成自己塞給自己

garyparrot commented 2 years ago

我試著建立另外一個指令 fio --name=ssd-test --bs=1KiB --size=20GiB --rw=write --ioengine=io_uring --numjobs=8 --group_reporting

第一次跑

ssd-test: (groupid=0, jobs=8): err= 0: pid=645875: Sat Feb 12 00:30:39 2022
  write: IOPS=1290k, BW=1230MiB/s (1290MB/s)(149GiB/124025msec); 0 zone resets
    slat (nsec): min=284, max=11658k, avg=1764.75, stdev=3065.82
    clat (nsec): min=39, max=52644k, avg=4125.04, stdev=21674.04
     lat (nsec): min=1959, max=52646k, avg=5936.86, stdev=21895.57
    clat percentiles (nsec):
     |  1.00th=[ 2320],  5.00th=[ 2800], 10.00th=[ 3024], 20.00th=[ 3312],
     | 30.00th=[ 3504], 40.00th=[ 3696], 50.00th=[ 3920], 60.00th=[ 4128],
     | 70.00th=[ 4384], 80.00th=[ 4704], 90.00th=[ 5280], 95.00th=[ 5728],
     | 99.00th=[ 6752], 99.50th=[ 7264], 99.90th=[10432], 99.95th=[56576],
     | 99.99th=[90624]
   bw (  MiB/s): min= 1114, max= 1264, per=100.00%, avg=1232.05, stdev= 2.25, samples=1976
   iops        : min=1168508, max=1325768, avg=1291905.64, stdev=2359.07, samples=1976
  lat (nsec)   : 50=0.01%, 100=0.03%, 250=0.02%, 500=0.01%, 750=0.01%
  lat (nsec)   : 1000=0.01%
  lat (usec)   : 2=0.25%, 4=53.31%, 10=46.28%, 20=0.04%, 50=0.01%
  lat (usec)   : 100=0.05%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (msec)   : 100=0.01%
  cpu          : usr=12.23%, sys=50.40%, ctx=159931930, majf=60, minf=222
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,160000000,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=1230MiB/s (1290MB/s), 1230MiB/s-1230MiB/s (1290MB/s-1290MB/s), io=149GiB (160GB), run=124025-124025msec

Disk stats (read/write):
  nvme0n1: ios=0/712958, merge=0/2742, ticks=0/15551252, in_queue=14140572, util=67.43%

特別注意: lat (nsec): min=1959, max=52646k, avg=5936.86, stdev=21895.57, IO Request 最久只要 0.05 秒即可完成

第二次跑

上面的執行完成後我又立刻下了一次同樣的指令,然後這次得到不太一樣的結果。

ssd-test: (groupid=0, jobs=8): err= 0: pid=647092: Sat Feb 12 00:44:43 2022
  write: IOPS=218k, BW=208MiB/s (218MB/s)(149GiB/732517msec); 0 zone resets
    slat (nsec): min=293, max=7697.0k, avg=1659.20, stdev=2504.07
    clat (nsec): min=35, max=7004.9M, avg=34313.23, stdev=4675105.33
     lat (nsec): min=1331, max=7004.9M, avg=36023.72, stdev=4675115.33
    clat percentiles (nsec):
     |  1.00th=[    1880],  5.00th=[    2024], 10.00th=[    2192],
     | 20.00th=[    2768], 30.00th=[    3120], 40.00th=[    3408],
     | 50.00th=[    3664], 60.00th=[    3920], 70.00th=[    4256],
     | 80.00th=[    4640], 90.00th=[    5344], 95.00th=[    6304],
     | 99.00th=[   20352], 99.50th=[   24448], 99.90th=[   85504],
     | 99.95th=[26083328], 99.99th=[46399488]
   bw (  KiB/s): min=    8, max=1294543, per=100.00%, avg=247319.54, stdev=49730.52, samples=10091
   iops        : min=    9, max=1325615, avg=253259.06, stdev=50924.03, samples=10091
  lat (nsec)   : 50=0.01%, 100=0.02%, 250=0.01%, 500=0.01%, 750=0.01%
  lat (nsec)   : 1000=0.01%
  lat (usec)   : 2=3.68%, 4=58.93%, 10=34.08%, 20=2.22%, 50=0.90%
  lat (usec)   : 100=0.07%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.06%
  lat (msec)   : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2000=0.01%, >=2000=0.01%
  cpu          : usr=2.08%, sys=8.57%, ctx=159951711, majf=0, minf=90
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,160000000,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=208MiB/s (218MB/s), 208MiB/s-208MiB/s (218MB/s-218MB/s), io=149GiB (160GB), run=732517-732517msec

Disk stats (read/write):
  nvme0n1: ios=4/684307, merge=0/4221, ticks=5737/330335580, in_queue=328981680, util=97.60%

這次 lat (nsec): min=1331, max=7004.9M, avg=36023.72, stdev=4675115.33 告訴我們 IO Request 最久可以花到 7 秒。

感覺上好像只要操到一個程度 SSD 的效能就不行了,跟上面那篇文章的內容很類似

chia7712 commented 2 years ago

感覺上好像只要操到一個程度 SSD 的效能就不行了,跟上面那篇文章的內容很類似

傷心,這是我的錯,太相信美光這個牌子,沒先做好功課

我開一個議題討論一下更換硬碟的事情 https://github.com/skiptests/astraea/issues/228

chia7712 commented 2 years ago

@ chinghongfang 等硬碟換完後 (#228),麻煩更新一下硬體資訊,謝謝

chia7712 commented 2 years ago

@garyparrot @qoo332001 要麻煩你們研究一下 intel 12gen是否能在ubuntu server上運行順利?可以先google一下看看有沒有災情

garyparrot commented 2 years ago

我嘗試跑去翻 Ubuntu Forum 的文章,發現他們有 Hardware compatability 的分享文章,不過最後回報記錄都差不多是 2020 年的事情,沒有參考價值。

下面是其他論壇內的搜尋結果

不過 Kafka 對 CPU 似乎沒有那麼敏感,感覺關鍵還是在 Disk 和 IO 上。

chia7712 commented 2 years ago

Linux 可能要等到 5.18 版才知道他們這些核心的差異

ubuntu server 21.04 版可以升級到此kernel嗎?理想上能用到大小核還是比較好,否則我們就要買沒有大小核的12 gen了

chia7712 commented 2 years ago

@chinghongfang 如果新硬碟已經換好,可否麻煩更新一下描述裡的硬體資訊?謝謝

garyparrot commented 2 years ago

5.18 連影子都沒有... https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/refs/

Ubuntu server 22.04 現在為 5.15.0, 透過 Ubuntu Mainline Kernel ppa 我可以升級到 5.17-rc6,升上去不代表沒問題就是了

chia7712 commented 2 years ago

5.18 連影子都沒有...

沒關係,那還是買11代的版本好了,在我們的環境已經跑了一陣子,穩定為上

chia7712 commented 2 years ago

@garyparrot @qoo332001 我後來想了一下,反正至少會再用兩年,還是先買12代頂規的CPU。後續等linux完整支援後再麻煩你們跟著更新

garyparrot commented 2 years ago

新設備的測試

一開始安裝的時候是裝 5.4 的 Kernel,這個 Kernel 似乎太老不認得我們的網卡(?)。後來直上 Ubuntu HWE,升級到 5.13 後網卡就能夠使用了。

設備規格

CPU: Intel i9-12900 CPU 3.2G(5.2G)/30M/UHD770/125W 主機板: 華碩 ROG STRIX Z690-G GAMING WIFI(M-ATX/1H1P/Intel 2.5G+Wi-Fi 6E)14+1相數位供電 記憶體: 美光Micron Crucial 32GB DDR5 4800 硬碟*3: 威剛XPG SX8200Pro 1TB/M.2 2280/讀:3500M/寫:3000M/TLC/SMI控 散熱器: 華碩TUF GAMING LC 240 ARGB 電源供應器: 全漢 聖武士 550W 銅牌/全日系/DC-DC 網卡: XG-C100C [10Gigabit埠] RJ45單埠高速網路卡/PCIe介面

Disk 寫入測試

fio --name=ssd-test --bs=1KiB --size=70GiB --rw=write --ioengine=io_uring --numjobs=10 --group_reporting

我是用上次的指令,只是 size 和 numjobs 有稍微提升。 目前整體寫起來大概最差勁可以有 1GB/s 的 sequential write bandwidth。

SSD1 Benchmark system stats SSD2 Benchmark system stats SSD3 Benchmark system stats

fio output for SSD1 ``` ssd-test: (g=0): rw=write, bs=(R) 1000B-1000B, (W) 1000B-1000B, (T) 1000B-1000B, ioengine=io_uring, iodepth=1 ... fio-3.16 Starting 10 processes ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) Jobs: 10 (f=10): [W(10)][25.9%][w=1038MiB/s][w=1088k IOPS][eta 07m:47s] Jobs: 10 (f=10): [W(10)][26.0%][w=1034MiB/s][w=1084k IOPS][eta 07m:46s] Jobs: 10 (f=10): [W(10)][61.1%][w=1046MiB/s][w=1097k IOPS][eta 04m:09s] Jobs: 10 (f=10): [W(10)][74.9%][w=1033MiB/s][w=1083k IOPS][eta 02m:41s] Jobs: 7 (f=7): [_(1),W(4),_(1),W(2),_(1),W(1)][100.0%][w=888MiB/s][w=932k IOPS][eta 00m:00s] ssd-test: (groupid=0, jobs=10): err= 0: pid=12815: Fri Mar 11 12:32:39 2022 write: IOPS=1089k, BW=1039MiB/s (1089MB/s)(652GiB/642544msec); 0 zone resets slat (nsec): min=283, max=10449k, avg=1160.11, stdev=1471.71 clat (nsec): min=15, max=174329k, avg=7791.68, stdev=410057.86 lat (nsec): min=1469, max=174330k, avg=8983.93, stdev=410061.22 clat percentiles (usec): | 1.00th=[ 3], 5.00th=[ 3], 10.00th=[ 3], 20.00th=[ 3], | 30.00th=[ 3], 40.00th=[ 3], 50.00th=[ 4], 60.00th=[ 4], | 70.00th=[ 4], 80.00th=[ 5], 90.00th=[ 5], 95.00th=[ 6], | 99.00th=[ 7], 99.50th=[ 9], 99.90th=[ 16], 99.95th=[ 20], | 99.99th=[27132] bw ( MiB/s): min= 700, max= 2299, per=100.00%, avg=1039.57, stdev= 9.72, samples=12838 iops : min=734098, max=2410800, avg=1090073.26, stdev=10187.86, samples=12838 lat (nsec) : 20=0.01%, 50=0.01%, 100=0.01%, 250=0.01%, 500=0.01% lat (nsec) : 750=0.01%, 1000=0.01% lat (usec) : 2=0.52%, 4=78.38%, 10=20.74%, 20=0.29%, 50=0.03% lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01% lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01% lat (msec) : 100=0.01%, 250=0.01% cpu : usr=6.60%, sys=25.29%, ctx=699898326, majf=0, minf=123 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,700000000,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: bw=1039MiB/s (1089MB/s), 1039MiB/s-1039MiB/s (1089MB/s-1089MB/s), io=652GiB (700GB), run=642544-642544msec Disk stats (read/write): nvme1n1: ios=102/2831034, merge=0/19692, ticks=810/334123941, in_queue=334128188, util=99.75% ```
fio output for SSD2 ``` ssd-test: (g=0): rw=write, bs=(R) 1000B-1000B, (W) 1000B-1000B, (T) 1000B-1000B, ioengine=io_uring, iodepth=1 ... fio-3.16 Starting 10 processes ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) Jobs: 10 (f=10): [W(10)][1.3%][w=1899MiB/s][w=1991k IOPS][eta 06m:30s] Jobs: 10 (f=10): [W(10)][6.5%][w=1017MiB/s][w=1066k IOPS][eta 09m:23s] Jobs: 10 (f=10): [W(10)][53.4%][w=1029MiB/s][w=1079k IOPS][eta 04m:58s] Jobs: 10 (f=10): [W(10)][71.7%][w=1036MiB/s][w=1086k IOPS][eta 03m:02s] Jobs: 10 (f=10): [W(10)][87.4%][w=996MiB/s][w=1044k IOPS][eta 01m:21s] Jobs: 8 (f=8): [W(2),_(1),W(6),_(1)][100.0%][w=975MiB/s][w=1022k IOPS][eta 00m:00s] ssd-test: (groupid=0, jobs=10): err= 0: pid=12658: Fri Mar 11 12:19:22 2022 write: IOPS=1091k, BW=1040MiB/s (1091MB/s)(652GiB/641635msec); 0 zone resets slat (nsec): min=287, max=11387k, avg=1186.10, stdev=1269.22 clat (nsec): min=16, max=461043k, avg=7744.84, stdev=409142.77 lat (nsec): min=1376, max=461044k, avg=8963.44, stdev=409145.52 clat percentiles (usec): | 1.00th=[ 3], 5.00th=[ 3], 10.00th=[ 3], 20.00th=[ 3], | 30.00th=[ 3], 40.00th=[ 4], 50.00th=[ 4], 60.00th=[ 4], | 70.00th=[ 4], 80.00th=[ 5], 90.00th=[ 5], 95.00th=[ 6], | 99.00th=[ 7], 99.50th=[ 9], 99.90th=[ 16], 99.95th=[ 20], | 99.99th=[27919] bw ( MiB/s): min= 513, max= 2222, per=100.00%, avg=1041.02, stdev= 9.61, samples=12813 iops : min=538414, max=2330902, avg=1091596.34, stdev=10081.87, samples=12813 lat (nsec) : 20=0.01%, 50=0.01%, 100=0.01%, 250=0.01%, 500=0.01% lat (nsec) : 750=0.01%, 1000=0.01% lat (usec) : 2=0.53%, 4=77.68%, 10=21.43%, 20=0.30%, 50=0.03% lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01% lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01% lat (msec) : 100=0.01%, 250=0.01%, 500=0.01% cpu : usr=6.70%, sys=25.73%, ctx=699942805, majf=0, minf=108 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,700000000,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: bw=1040MiB/s (1091MB/s), 1040MiB/s-1040MiB/s (1091MB/s-1091MB/s), io=652GiB (700GB), run=641635-641635msec Disk stats (read/write): nvme0n1: ios=0/2805676, merge=0/18949, ticks=0/334926532, in_queue=334929871, util=99.65% ```
fio output for SSD3 ``` ssd-test: (g=0): rw=write, bs=(R) 1000B-1000B, (W) 1000B-1000B, (T) 1000B-1000B, ioengine=io_uring, iodepth=1 ... fio-3.16 Starting 10 processes ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) ssd-test: Laying out IO file (1 file / 66757MiB) Jobs: 10 (f=10): [W(10)][4.7%][w=1023MiB/s][w=1073k IOPS][eta 08m:43s] Jobs: 10 (f=10): [W(10)][28.6%][w=1026MiB/s][w=1075k IOPS][eta 07m:34s] Jobs: 10 (f=10): [W(10)][29.2%][w=1005MiB/s][w=1054k IOPS][eta 07m:31s] Jobs: 10 (f=10): [W(10)][58.6%][w=1018MiB/s][w=1067k IOPS][eta 04m:27s] Jobs: 10 (f=10): [W(10)][77.0%][w=1009MiB/s][w=1058k IOPS][eta 02m:29s] Jobs: 10 (f=10): [W(10)][80.4%][w=1026MiB/s][w=1076k IOPS][eta 02m:07s] Jobs: 10 (f=10): [W(10)][94.3%][w=1028MiB/s][w=1078k IOPS][eta 00m:37s] Jobs: 1 (f=1): [_(5),W(1),_(4)][100.0%][w=413MiB/s][w=433k IOPS][eta 00m:00s] ssd-test: (groupid=0, jobs=10): err= 0: pid=12604: Fri Mar 11 12:05:01 2022 write: IOPS=1082k, BW=1032MiB/s (1082MB/s)(652GiB/647050msec); 0 zone resets slat (nsec): min=286, max=7965.5k, avg=1156.26, stdev=1117.43 clat (nsec): min=16, max=924407k, avg=7854.72, stdev=415803.25 lat (nsec): min=1471, max=924408k, avg=9044.65, stdev=415805.52 clat percentiles (usec): | 1.00th=[ 3], 5.00th=[ 3], 10.00th=[ 3], 20.00th=[ 3], | 30.00th=[ 3], 40.00th=[ 4], 50.00th=[ 4], 60.00th=[ 4], | 70.00th=[ 4], 80.00th=[ 5], 90.00th=[ 5], 95.00th=[ 6], | 99.00th=[ 7], 99.50th=[ 8], 99.90th=[ 16], 99.95th=[ 20], | 99.99th=[26084] bw ( MiB/s): min= 205, max= 2343, per=100.00%, avg=1033.88, stdev=11.58, samples=12906 iops : min=215521, max=2457614, avg=1084105.17, stdev=12139.10, samples=12906 lat (nsec) : 20=0.01%, 50=0.01%, 100=0.01%, 250=0.01%, 500=0.01% lat (nsec) : 750=0.01%, 1000=0.01% lat (usec) : 2=0.51%, 4=78.41%, 10=20.78%, 20=0.25%, 50=0.02% lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01% lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01% lat (msec) : 100=0.01%, 250=0.01%, 500=0.01%, 1000=0.01% cpu : usr=6.55%, sys=25.09%, ctx=699942343, majf=0, minf=126 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,700000000,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: bw=1032MiB/s (1082MB/s), 1032MiB/s-1032MiB/s (1082MB/s-1082MB/s), io=652GiB (700GB), run=647050-647050msec Disk stats (read/write): nvme2n1: ios=0/2744693, merge=0/17569, ticks=0/336639847, in_queue=336643513, util=99.67% ```

網路卡測試

iperf 互打 9.41 Gbits/sec

kafka@chia04:~$ iperf -c 192.168.103.177
------------------------------------------------------------
Client connecting to 192.168.103.177, TCP port 5001
TCP window size: 2.59 MByte (default)
------------------------------------------------------------
[  3] local 192.168.103.174 port 60588 connected with 192.168.103.177 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  11.0 GBytes  9.41 Gbits/sec
kafka@chia07:~$ iperf -c 192.168.103.174
------------------------------------------------------------
Client connecting to 192.168.103.174, TCP port 5001
TCP window size: 4.00 MByte (default)
------------------------------------------------------------
[  3] local 192.168.103.177 port 38278 connected with 192.168.103.174 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  11.0 GBytes  9.41 Gbits/sec

CPU 測試

使用 stress -c 24 製造 24 個 CPU Loading。

image

https://snapshots-origin.grafana.net/dashboard/snapshot/2iP986C3G0oXflLzTavDzYxlclXwl1KB?orgId=2

整體執行大約 10 分鐘,可以發現

  1. 整體執行過程中 CPU 時脈維持在 4.9 GHz 和 3.7 GHz 這兩群,應該是大小核沒錯
  2. 溫度大約 80 ~ 90 攝氏,實驗結束後水冷很快把溫度帶下去
chia7712 commented 2 years ago

@garyparrot 感謝測試,可否麻煩驗證一個情境是把broker掛到三顆硬碟上,然後做吞吐量測試,看看有無發揮三個硬碟的速度嗎?謝謝

garyparrot commented 2 years ago

image

對外網卡 10 Gbps 似乎意味著單位時間湧進來的資料量就這麼多,現在 Disk IO 那邊看起來非常頻寬非常充沛,充沛到我覺得有點利用率低了。從前面的數據來看 Disk IO 應該可以應付 3 GB/s,但實際只用到 1/3 這麼多。

或許這些多餘的 Disk IO 頻寬可能被 data folder 間的搬移所用,但是實際好處多不多我不知道。

chia7712 commented 2 years ago

@garyparrot 感謝驗證,那我們就接著這個規格往下買

或許這些多餘的 Disk IO 頻寬可能被 data folder 間的搬移所用

yep,這是你們之後balancer要考慮的一種情境(平衡硬碟空間使用率)。另外在應用有一種是網路傳的是壓縮資料、硬碟存的是解壓縮資料(例如consumer端不支援解壓縮),這時候網路頻寬用一些但硬碟頻寬反而吃很重。

garyparrot commented 2 years ago

新硬體測試

這次新到了 5 台設備,看起來規格和先前的那臺水冷很類似。

硬碟測試

fio --name=ssd-test --bs=1KiB --size=70GiB --rw=write --ioengine=io_uring --numjobs=10 --group_reporting

每臺的 SSD測試結果

# chia8 SSD1
Run status group 0 (all jobs):
  WRITE: bw=1126MiB/s (1180MB/s), 1126MiB/s-1126MiB/s (1180MB/s-1180MB/s), io=652GiB (700GB), run=593020-593020msec

# chia8 SSD2
Run status group 0 (all jobs):
  WRITE: bw=1128MiB/s (1183MB/s), 1128MiB/s-1128MiB/s (1183MB/s-1183MB/s), io=652GiB (700GB), run=591700-591700msec

# chia8 SSD3
Run status group 0 (all jobs):
  WRITE: bw=1191MiB/s (1249MB/s), 1191MiB/s-1191MiB/s (1249MB/s-1249MB/s), io=115GiB (123GB), run=98737-98737msec

# chia9 SSD1
Run status group 0 (all jobs):
  WRITE: bw=1148MiB/s (1204MB/s), 1148MiB/s-1148MiB/s (1204MB/s-1204MB/s), io=652GiB (700GB), run=581457-581457msec

# chia9 SSD2
Run status group 0 (all jobs):
  WRITE: bw=1140MiB/s (1195MB/s), 1140MiB/s-1140MiB/s (1195MB/s-1195MB/s), io=652GiB (700GB), run=585679-585679msec

# chia9 SSD 3
Run status group 0 (all jobs):
  WRITE: bw=1184MiB/s (1242MB/s), 1184MiB/s-1184MiB/s (1242MB/s-1242MB/s), io=101GiB (108GB), run=87357-87357msec

# chia10 SSD1
Run status group 0 (all jobs):
  WRITE: bw=1144MiB/s (1199MB/s), 1144MiB/s-1144MiB/s (1199MB/s-1199MB/s), io=652GiB (700GB), run=583674-583674msec

# chia10 SSD2
Run status group 0 (all jobs):
  WRITE: bw=1142MiB/s (1197MB/s), 1142MiB/s-1142MiB/s (1197MB/s-1197MB/s), io=652GiB (700GB), run=584684-584684msec

# chia10 SSD3
Run status group 0 (all jobs):
  WRITE: bw=1126MiB/s (1181MB/s), 1126MiB/s-1126MiB/s (1181MB/s-1181MB/s), io=652GiB (700GB), run=592698-592698msec

# chia11 SSD1
Run status group 0 (all jobs):
  WRITE: bw=1119MiB/s (1174MB/s), 1119MiB/s-1119MiB/s (1174MB/s-1174MB/s), io=652GiB (700GB), run=596473-596473msec

# chia11 SSD2
Run status group 0 (all jobs):
  WRITE: bw=1136MiB/s (1192MB/s), 1136MiB/s-1136MiB/s (1192MB/s-1192MB/s), io=652GiB (700GB), run=587465-587465msec

# chia11 SSD3
Run status group 0 (all jobs):
  WRITE: bw=1144MiB/s (1199MB/s), 1144MiB/s-1144MiB/s (1199MB/s-1199MB/s), io=652GiB (700GB), run=583624-583624msec

# chia 12 SSD1
Run status group 0 (all jobs):                                                                                        
  WRITE: bw=1145MiB/s (1201MB/s), 1145MiB/s-1145MiB/s (1201MB/s-1201MB/s), io=652GiB (700GB), run=583021-583021msec

# chia 12 SSD2
Run status group 0 (all jobs):
  WRITE: bw=1134MiB/s (1189MB/s), 1134MiB/s-1134MiB/s (1189MB/s-1189MB/s), io=652GiB (700GB), run=588626-588626msec

# chia 12 SSD3
Run status group 0 (all jobs):
  WRITE: bw=1131MiB/s (1186MB/s), 1131MiB/s-1131MiB/s (1186MB/s-1186MB/s), io=652GiB (700GB), run=590122-590122msec

Kakfa IO 測試

image

這裡建立 5 個 Brokers,在每個 Broker 上開 performance tool。

chia7712 commented 2 years ago

@garyparrot 那個截圖y軸的數字好像被遮住了,可以直接講一下數字嗎?

garyparrot commented 2 years ago

最上面是900MB, 依序 100 MB 遞減下去

chia7712 commented 2 years ago

最上面是900MB, 依序 100 MB 遞減下去

好的,感謝~

harryteng9527 commented 2 years ago

iperf進行網路測試 (TP-Link TL-SX1008)

兩兩依序單向打資料: (i->j)

[ ID] Interval       Transfer     Bandwidth
[  1] 0.0-10.0 sec  10.9 GBytes  9.38 Gbits/sec
所有排列組合都是10.9GBytest Transfer, 9.38Gbits/sec bandwidth。

六台同時打資料: (1->2->3->4->5->6->1)

[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  10.9 GBytes  9.40 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  11.0 GBytes  9.41 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  11.0 GBytes  9.41 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  10.9 GBytes  9.40 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  10.9 GBytes  9.40 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  10.9 GBytes  9.38 Gbits/sec 

註:風扇的聲音有點大聲

chia7712 commented 2 years ago

@harryteng9527 感謝測試,麻煩把新的硬體更新到此議題的描述喔

chia7712 commented 2 years ago

@harryteng9527 下禮拜三更新完硬體後麻煩更新一下描述喔

chia7712 commented 2 years ago

@harryteng9527 是否方便將描述中的內容依照項目分開?也就是 partitioner, balancer and consumer

chia7712 commented 1 year ago

@chinghongfang @garyparrot @qoo332001 @harryteng9527 麻煩將你們最後的討論結果更新到此議題上,然後我們就可以關閉這個題目了

harryteng9527 commented 1 year ago

這是 Assignor 與 Partitioner 叢集合併且接上新 switch 後,互傳的網路流量實驗結果 image

chia7712 commented 1 year ago

@harryteng9527 感謝,看來網路的部分是沒什麼問題,接下來就是要驗證硬碟的部分

harryteng9527 commented 1 year ago

硬碟驗證

這邊提供 assignor 與 partitioner 叢集的硬碟寫入效能

測試指令

fio --name=ssd-test --bs=1KiB --size=70GiB --rw=write --ioengine=io_uring --numjobs=10 --group_reporting

Assignor

Assignor 叢集由下列電腦組成:

  1. 小台的 12 代 Intel
  2. 大台的 12 代 Intel
  3. 11 代 Intel
  4. 13 代 Intel

這邊列出各個電腦的 Disk 的寫入速率、溫度 而 12 代的電腦因為 ubuntu 版本沒有更新到 22.04 or 22.10 ,所以 Prometheus 撈不到 nvme 的溫度

小台的 12 代 Intel

磁碟寫入速率: image

Grafana Snapshot 連結:SSD - 小台 12 代

大台的 12 代 Intel

此台規格與 Partitioner 的 12 代 Intel 相同

磁碟寫入速率: image

Grafana Snapshot 連結:SSD - 大台 12 代

11 代 Intel

磁碟寫入速率與溫度: image

Grafana Snapshot 連結:SSD - 11 Intel

13 代 Intel

磁碟寫入速率與溫度: image

13 代 Intel 的磁碟寫入速率偏低

Grafana Snapshot 連結:SSD - 13 Intel

Partitioner

磁碟寫入速率與溫度: image

Grafana Snapshot 連結:SSD - Partitioner

chia7712 commented 1 year ago

@harryteng9527 感謝整理,做得很好。不過想請問 partitioner 那個圖片實在新的機器上測試的嗎?還是舊的設備

harryteng9527 commented 1 year ago

請問 partitioner 那個圖片實在新的機器上測試的嗎?還是舊的設備

舊的設備

目前測試都沒有更換 SSD 到其他台電腦,都是測試原本就插在上面的 SSD

chia7712 commented 1 year ago

另外小台12代的右圖的速度震盪的很嚴重,也是溫度的問題嗎?可否升級作業系統讓我們可以看一下溫度變化

harryteng9527 commented 1 year ago

小台12代的右圖的速度震盪的很嚴重,也是溫度的問題嗎?

這就不確定了,目前只有先把每類電腦的 SSD 都測試一次

可否升級作業系統讓我們可以看一下溫度變化

請問全部都要升級嗎? 還是先升級一台看看就好

chia7712 commented 1 year ago

請問全部都要升級嗎? 還是先升級一台看看就好

一台就好

另外新設備的部分,可否多確認幾台看看是否都有效能低落的問題

chia7712 commented 1 year ago

小台12代我看了一下主機板的介紹,內建的散熱片只有一塊,另一顆ssd是放在背板,這應該是由於主機板是小塊的後果

當時那個小台的intel 12 代買來給你們測試的嗎?還是我自己家裡要用的換過去給你們的?

harryteng9527 commented 1 year ago

小台的 intel 12 代原本是學長家裡用的

當時 partitioner 的 11 代電腦跟學長交換小台 12 代 intel