thustorage / max

A high-performance file system for multicore CPUs and flash storage
31 stars 11 forks source link

您好,我在挂载max的时候出现了一些问题 #2

Closed champagne9 closed 2 years ago

champagne9 commented 2 years ago

执行到 disk-tools/mkfs/mkfs.f2fs -N 8 /dev/nvme0n1

时提示

This device dosen't support BLKSECDISCARD

请问是由于我的硬件设备不匹配所导致的吗?

liaoxiaojian commented 2 years ago

您好,我使用Intel P3700和Intel750 SSD是没有遇到过这样的问题的。请问你这个提示影响Max的挂载(即后面的mount指令)吗?

champagne9 commented 2 years ago

是的,mount之后提示无法识别max。 我使用的是intel 670p。

liaoxiaojian commented 2 years ago

你可以把dmesg的报错在这发一下。理论上SSD不支持discard(即BLKSECDISCARD)也不影响Max或F2FS的运行

champagne9 commented 2 years ago

root@lab230:/home/lab/Downloads/max-main# mount -t max -o imds=72,mlog=8 /dev/nvme0n1 /mnt/test mount: unknown filesystem type 'max'

[ 290.003788] F2FS-fs (nvme0n1): Failed to get valid F2FS checkpoint [ 290.003919] F2FS-fs (nvme0n1): Failed to get valid F2FS checkpoint

liaoxiaojian commented 2 years ago

看报错可能是SSD没格式化成功,找不到checkpoint. 你可以先确认下mkfs是否成功。下面是我机器上一个成功的例子: root@e2123:/home/lxj/max-opensource/disk-tools# ./mkfs/mkfs.f2fs -N 8 /dev/nvme0n1

F2FS-tools: mkfs.f2fs Ver: 1.8.0 (2017-02-03)

Info: Debug level = 0 Info: Label = Info: Trim is enabled Info: [/dev/nvme0n1] Disk Model: INTEL SSDPE2MD020102 Info: Segments per section = 1 Info: Sections per zone = 1 Info: sector size = 512 Info: total sectors = 3907029168 (1907729 MB) Info: zone aligned segment0 blkaddr: 512 Info: [/dev/nvme0n1] Discarding device Info: This device doesn't support BLKSECDISCARD Info: Discarded 1907729 MB mlog 0: 951891, 951890, 951889, 951888, 1, 0 mlog 1: 951887, 951886, 951885, 951884, 3, 2 mlog 2: 951883, 951882, 951881, 951880, 5, 4 mlog 3: 951879, 951878, 951877, 951876, 7, 6 mlog 4: 951875, 951874, 951873, 951872, 9, 8 mlog 5: 951871, 951870, 951869, 951868, 11, 10 mlog 6: 951867, 951866, 951865, 951864, 13, 12 mlog 7: 951863, 951862, 951861, 951860, 15, 14 Info: Overprovision ratio = 0.150% Info: Overprovision segments = 2766 (GC reserved = 1341) Info: format successful

特别注意是否输出了mlog那几行,每一行的数字表示mlog里不同温度的log head的起始地址。

champagne9 commented 2 years ago

您好,以下是我的运行结果:

root@lab230:/home/lab/Downloads/max-main# disk-tools/mkfs/mkfs.f2fs -N 8 /dev/nvme0n1

F2FS-tools: mkfs.f2fs Ver: 1.8.0 (2017-02-03)

Info: Debug level = 0 Info: Label = Info: Trim is enabled Info: [/dev/nvme0n1] Disk Model: INTEL SSDPEKNU51002C Info: Segments per section = 1 Info: Sections per zone = 1 Info: sector size = 512 Info: total sectors = 1000215216 (488386 MB) Info: zone aligned segment0 blkaddr: 512 Info: [/dev/nvme0n1] Discarding device Info: This device doesn't support BLKSECDISCARD Info: Discarded 488386 MB mlog 0: 243674, 243673, 243672, 243671, 1, 0 mlog 1: 243670, 243669, 243668, 243667, 3, 2 mlog 2: 243666, 243665, 243664, 243663, 5, 4 mlog 3: 243662, 243661, 243660, 243659, 7, 6 mlog 4: 243658, 243657, 243656, 243655, 9, 8 mlog 5: 243654, 243653, 243652, 243651, 11, 10 mlog 6: 243650, 243649, 243648, 243647, 13, 12 mlog 7: 243646, 243645, 243644, 243643, 15, 14 Info: Overprovision ratio = 0.290% Info: Overprovision segments = 1401 (GC reserved = 697) Info: format successful

看起来结果是一样的,但之后就报错了。

liaoxiaojian commented 2 years ago

你可以再看一下insmod max.ko是否能成功,insmod之后dmesg有没有报错。

champagne9 commented 2 years ago

这是insmod max.ko之后的dmesg信息 [ 225.894781] max: module verification failed: signature and/or required key missing - tainting kernel

liaoxiaojian commented 2 years ago

可能是F2FS checkpoint的格式有问题,还需要你提供一些额外的信息才能定位具体问题: 将Max文件夹下super.c文件的第1190行的代码替换成, f2fs_msg(sb, KERN_ERR, "Failed to get valid F2FS checkpoint, err=%d", err); 再重新编译Max模块(内核不用重新编),之后重新mkfs和mount,看下错误码是什么。

champagne9 commented 2 years ago

我成功挂载了!但是在使用fxmark时出现了些问题。以下是我的命令和报错信息:

root@b230:/home/lab/Downloads/max-main/fxmark# bin/plotter.py --ty sc --log out.log --out out out.log Traceback (most recent call last): File "bin/plotter.py", line 349, in plotter = Plotter(opts.log) File "bin/plotter.py", line 40, in init self.config = self._get_config() File "bin/plotter.py", line 66, in _get_config config_dic[key] = sorted(list(all_config[i])) IndexError: list index out of range

liaoxiaojian commented 2 years ago

Max的这个仓库可能不支持直接使用FxMark的plotter函数(原因可能是Max在Fxmark的输出中添加了I/O利用率等指标),我也没使用过plotter函数画图,都是查看输出中的平均吞吐率并手动画图。

你可以把fxmark运行结束后的结果发在这,我可以帮你看看指标的具体含义,以及支持plotter.py的方法。

champagne9 commented 2 years ago

这里好像没有输出,只有运行的日志信息:

SYSTEM = Linux b230 4.2.3max #1 SMP Fri Mar 11 11:19:50 CST 2022 x86_64 x86_64 x86_64 GNU/Linux

DISK_SIZE = 100G

DURATION = 30s

TEST_ROOT = /home/lab/Downloads/max-main/fxmark/bin/root

DIRECTIO = bufferedio,directio

MEDIA_TYPES = ssd,hdd,nvme

FS_TYPES = ext4,xfs,f2fs,max

BENCH_TYPES = DWAL,DWOL,MWCL,MWUL,filebench_varmail,filebench_fileserver,dbench_client,exim,rocksdb_overwrite

NCORES = 1,2,4,10,15,20

CORE_SEQ = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39

MODEL_NAME = Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz

PHYSICAL_CHIPS = 2

CORE_PER_CHIP = 10

SMT_LEVEL = 2

liaoxiaojian commented 2 years ago

看样子你的测试好像没运行起来。正常运行后的结果应该是像下面这样:

root@e2123:/home/lxj/max-opensource/fxmark# bin/run-fxmark.py 
### SYSTEM         = Linux e2123 4.2.3 #3 SMP Wed Mar 9 12:54:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
### DISK_SIZE      = 100G
### DURATION       = 30s
### TEST_ROOT      = /home/lxj/max-opensource/fxmark/bin/root
### DIRECTIO       = bufferedio,directio
### MEDIA_TYPES    = ssd,hdd,nvme
### FS_TYPES       = ext4,xfs,btrfs,f2fs,max
### BENCH_TYPES    = DWAL,DWOL,MWCL,MWUL,filebench_varmail,filebench_fileserver,dbench_client,exim,rocksdb_overwrite
### NCORES         = 1,2,4,9,18,27,36,45,54,63,72
### CORE_SEQ       = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71

### MODEL_NAME     = Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
### PHYSICAL_CHIPS = 4
### CORE_PER_CHIP  = 18
### SMT_LEVEL      = 1

## nvme:max:DWOL:72:bufferedio
# ncpu secs works works/sec real.sec user.sec nice.sec sys.sec idle.sec iowait.sec irq.sec softirq.sec steal.sec guest.sec user.util nice.util sys.util idle.util iowait.util irq.util softirq.util steal.util guest.util

72 29.999837 3938217875.000000 131274642.304006 30.0912 88.38 0 2076.04 2164.84 3.17 0 0.05 0 0 2.03994 0 47.9181 49.9677 0.0731683 0 0.00115407 0 0

### NUM_TEST_CONF  = 1

注意到测试前不需要挂载文件系统,测试程序run-fxmark.py会帮你自动挂载。例如Max的自动挂载函数(mount_max)在run-fxmark.py的第371行,在里面可以手动调整挂载参数例如imds和mlog的数量。

champagne9 commented 2 years ago

我在挂载max前运行run-fxmark.py,还是没运行起来。。

champagne9 commented 2 years ago

您好,请问有什么办法可以定位run-fxmark.py运行时发生的错误吗?

liaoxiaojian commented 2 years ago

https://github.com/thustorage/max/blob/main/fxmark/bin/run-fxmark.py 文件第53行设置self.DEBUG_OUT= True可以输出调试的信息。

run-fxmark.py需要python3的支持,有了python3应该很容易运行起来。遇到fxmark的问题可以自己手动调试run-fxmark.py第564-591行,或询问fxmark的作者(https://github.com/sslab-gatech/fxmark),这不在本仓库的解决范围内。

champagne9 commented 2 years ago

https://github.com/thustorage/max/blob/main/fxmark/bin/run-fxmark.py 文件第53行设置self.DEBUG_OUT= True可以输出调试的信息。

run-fxmark.py需要python3的支持,有了python3应该很容易运行起来。遇到fxmark的问题可以自己手动调试run-fxmark.py第564-591行,或询问fxmark的作者(https://github.com/sslab-gatech/fxmark),这不在本仓库的解决范围内。

好的,十分感谢!