Closed zhijianli88 closed 6 years ago
Hi @zhijianli88, I suspect that ndctl failed to inject bad blocks (in the line number 68 in this snippet). Please test https://github.com/pmem/pmdk/pull/3046 and let me know the result and post the logs. This PR does not fix anything, it will just help us to debug this issue.
root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test# ./RUNTESTS -f non-pmem pmempool_create ...
pmempool_create/TEST9: PASS pmempool_create/TEST9: SETUP (check/non-pmem/static-nondebug) pmempool_create/TEST9: PASS pmempool_create/TEST10: SETUP (check/non-pmem/debug) Error: ndctl failed to inject or retain bad blocks RUNTESTS: stopping: pmempool_create/TEST10 failed, TEST=check FS=non-pmem BUILD=debug root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test# cat pmempool_create/out cat: pmempool_create/out: No such file or directory root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test# cat pmempool_create/out10.log root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test#
add set -x to script
pmempool_create/TEST10: SETUP (check/non-pmem/debug)
On 07/02/2018 06:41 PM, Lukasz Dorau wrote:
Hi @zhijianli88 https://github.com/zhijianli88, I suspect that ndctl failed to inject bad blocks (in the line number 68 in this snippet). Please test pmem/pmdk#3046 https://github.com/pmem/pmdk/pull/3046 and let me know the result and post the logs. This PR does not fix anything, it will just help us to debug this issue.
ndctl version?
root@lkp-hsw-ep4 ~# ndctl --version 60.25.g6b0d7dd
On 07/03/2018 12:05 AM, Marcin Ślusarz wrote:
ndctl version?
Please post the pmempool_create/prep10.log
and pmempool_create/out10.log
log files.
root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test# cat pmempool_create/prep10.log disabled 0 regions disabled 8 regions zeroed 6 nmems enabled 8 regions { "dev":"namespace0.0", "mode":"devdax", "map":"dev", "size":30412800, "uuid":"610d117f-eb1c-4560-9934-283e4ab44c9a", "raw_uuid":"2dcfe1c1-38b2-4d38-81fe-9ea6b035ddad", "chardev":"dax0.0" }
On 07/03/2018 06:23 PM, Lukasz Dorau wrote:
Please post the |pmempool_create/prep10.log| log file
It should look like that:
disabled 4 regions
disabled 12 regions
zeroed 4 nmems
enabled 12 regions
{
"dev":"namespace8.0",
"mode":"devdax",
"map":"dev",
"size":29364224,
"uuid":"13d11f54-1813-4745-b871-5772f8539824",
"raw_uuid":"513638f0-a895-45e4-a922-81473fdf004b",
"chardev":"dax8.0",
"badblock_count":1,
"badblocks":[
{
"offset":11,
"length":1,
"dimms":[
"nmem1"
]
}
]
}
disabled 12 regions
So the part:
"badblock_count":1,
"badblocks":[
{
"offset":11,
"length":1,
"dimms":[
"nmem1"
]
}
is missing in your log - it means that ndctl failed to insert bad blocks. Now the question is why?....
Do the util_badblock/TEST[2-9] tests succeed?
root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test# ./RUNTESTS -f non-pmem util_badblock/ util_badblock/TEST0: SETUP (check/non-pmem/debug) util_badblock/TEST0: PASS util_badblock/TEST0: SETUP (check/non-pmem/nondebug) util_badblock/TEST0: PASS util_badblock/TEST0: SETUP (check/non-pmem/static-debug) util_badblock/TEST0: PASS util_badblock/TEST0: SETUP (check/non-pmem/static-nondebug) util_badblock/TEST0: PASS util_badblock/TEST1: SKIP DEVICE_DAX_PATH does not specify enough dax devices (min: 1) util_badblock/TEST1: SKIP DEVICE_DAX_PATH does not specify enough dax devices (min: 1) util_badblock/TEST1: SKIP DEVICE_DAX_PATH does not specify enough dax devices (min: 1) util_badblock/TEST1: SKIP DEVICE_DAX_PATH does not specify enough dax devices (min: 1) util_badblock/TEST2: SETUP (check/non-pmem/debug) util_badblock/TEST2: PASS util_badblock/TEST2: SETUP (check/non-pmem/nondebug) util_badblock/TEST2: PASS util_badblock/TEST2: SETUP (check/non-pmem/static-debug) util_badblock/TEST2: PASS util_badblock/TEST2: SETUP (check/non-pmem/static-nondebug) util_badblock/TEST2: PASS util_badblock/TEST3: SETUP (check/non-pmem/debug) Error: ndctl failed to inject or retain bad blocks RUNTESTS: stopping: util_badblock//TEST3 failed, TEST=check FS=non-pmem BUILD=debug
root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test# cat util_badblock/out3.log root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test# cat util_badblock/prep3.log disabled 0 regions disabled 8 regions zeroed 6 nmems enabled 8 regions { "dev":"namespace0.0", "mode":"devdax", "map":"dev", "size":30412800, "uuid":"ef964d96-f0d4-4b7d-9ea9-d8bb5c2b8fd2", "raw_uuid":"54696d0a-f877-4dc8-8ff6-bdc4817b6a24", "chardev":"dax0.0" } root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test# uname -a Linux lkp-hsw-ep4 4.18.0-rc1 #1 SMP Tue Jul 3 13:15:38 CST 2018 x86_64 GNU/Linux root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test# cat /proc/cmdline ip=::::lkp-hsw-ep4::dhcp root=/dev/ram0 user=lkp job=/lkp/scheduled/lkp-hsw-ep4/nvml-unit-tests-pmempool-non-pmem-debian-x86_64-2018-04-03.cgz-ce397d215ccd07b8ae3f71db689aedb85d56ab40-20180703-70034-34hb38-0.yaml ARCH=x86_64 kconfig=x86_64-rhel-7.2 branch=linus/master commit=ce397d215ccd07b8ae3f71db689aedb85d56ab40 BOOT_IMAGE=/pkg/linux/x86_64-rhel-7.2/gcc-7/ce397d215ccd07b8ae3f71db689aedb85d56ab40/vmlinuz-4.18.0-rc1 max_uptime=1230 RESULT_ROOT=/result/nvml-unit-tests/pmempool-non-pmem/lkp-hsw-ep4/debian-x86_64-2018-04-03.cgz/x86_64-rhel-7.2/gcc-7/ce397d215ccd07b8ae3f71db689aedb85d56ab40/e9b36bc73846a7b4199318898fe65b035bd451d6/6 LKP_SERVER=inn debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 net.ifnames=0 printk.devkmsg=on panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 drbd.minor_count=8 systemd.log_level=err ignore_loglevel console=tty0 earlyprintk=ttyS0,115200 console=ttyS0,115200 vga=normal rw root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test#
On 07/03/2018 10:15 PM, Lukasz Dorau wrote:
Do the util_badblock/TEST[2-9] tests succeeds?
Do the following commands:
$ sudo modprobe nfit_test
$ lsmod | grep nfit_test
succeed on your machine and what is the output?
root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test# lsmod | grep nfit nfit_test 36864 8 nd_pmem 20480 1 nfit_test nfit 61440 1 nfit_test device_dax 20480 2 dax_pmem,nfit_test libnvdimm 163840 6 dax_pmem,nfit_test,nd_btt,nd_pmem,nd_blk,nfit nfit_test_iomap 24576 6 dax_pmem,nfit_test,device_dax,nd_pmem,libnvdimm,nfit
On 07/03/2018 11:08 PM, Lukasz Dorau wrote:
Do the following commands:
|$ sudo modprobe nfit_test $ lsmod | grep nfit_test |
succeed on your machine and what is the output?
root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test# cat testconfig.sh NON_PMEM_FS_DIR=/tmp/tmp.KckJQLjz1P PMEM_FS_DIR=/fs/pmem0 NODE[0]=127.0.0.1 NODE_WORKING_DIR[0]=/tmp/node0 NODE_ADDR[0]=127.0.0.1 NODE_ENV[0]="PMEM_IS_PMEM_FORCE=1" NODE[1]=127.0.0.1 NODE_WORKING_DIR[1]=/tmp/node1 NODE_ADDR[1]=127.0.0.1 NODE_ENV[1]="PMEM_IS_PMEM_FORCE=1" NODE[2]=127.0.0.1 NODE_WORKING_DIR[2]=/tmp/node2 NODE_ADDR[2]=127.0.0.1 NODE_ENV[2]="PMEM_IS_PMEM_FORCE=1" NODE[3]=127.0.0.1 NODE_WORKING_DIR[3]=/tmp/node3 NODE_ADDR[3]=127.0.0.1 NODE_ENV[3]="PMEM_IS_PMEM_FORCE=1" TEST_PROVIDERS=sockets RPMEM_VALGRIND_ENABLED=y PMEM_FS_DIR_FORCE_PMEM=1 root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test# mount rootfs on / type rootfs (rw) sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime) proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) devtmpfs on /dev type devtmpfs (rw,nosuid,size=65623640k,nr_inodes=16405910,mode=755) securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime) selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755) tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k) tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd) pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) cgroup on /sys/fs/cgroup/cpu type cgroup (rw,nosuid,nodev,noexec,relatime,cpu) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb) cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma) cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=35,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=51333) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M) debugfs on /sys/kernel/debug type debugfs (rw,relatime) sunrpc on /run/rpc_pipefs type rpc_pipefs (rw,relatime) mqueue on /dev/mqueue type mqueue (rw,relatime) configfs on /sys/kernel/config type configfs (rw,relatime) tmp on /tmp type tmpfs (rw,relatime) inn:/result on /inn/result type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.1.1,mountvers=3,mountport=42102,mountproto=udp,local_lock=none,addr=192.168.1.1) inn:/pkg on /pkg type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.1.1,mountvers=3,mountport=42102,mountproto=udp,local_lock=none,addr=192.168.1.1) tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=13191772k,mode=700)
On 07/03/2018 11:12 PM, Li Zhijian wrote:
root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test# lsmod | grep nfit nfit_test 36864 8 nd_pmem 20480 1 nfit_test nfit 61440 1 nfit_test device_dax 20480 2 dax_pmem,nfit_test libnvdimm 163840 6 dax_pmem,nfit_test,nd_btt,nd_pmem,nd_blk,nfit nfit_test_iomap 24576 6 dax_pmem,nfit_test,device_dax,nd_pmem,libnvdimm,nfit
On 07/03/2018 11:08 PM, Lukasz Dorau wrote:
Do the following commands:
|$ sudo modprobe nfit_test $ lsmod | grep nfit_test |
succeed on your machine and what is the output?
It looks like injecting errors by ndctl does not work on your machine with ndctl v60.25.g6b0d7dd and kernel v4.18.0-rc1. I will check it.
Where do you have this version of ndctl (ndctl v60.25.g6b0d7dd) from? There is no such tag (v60.25) nor the commit ID (6b0d7dd) in the ndctl's git tree...
we used the pending branch https://github.com/pmem/ndctl --branch pending
Please test the latest stable release v61.2 and check if the results are the same.
Looks the latest release tag is v60.1 root@lkp-nex04 ~/ndctl# git tag | grep v61 v61 v61.1
On 07/04/2018 12:45 PM, Lukasz Dorau wrote:
Please test the latest stable release v61.2 and check if the results are the same.
still fails
pmempool_create/TEST10: SETUP (check/non-pmem/debug) Error: ndctl failed to inject or retain bad blocks RUNTESTS: stopping: pmempool_create/TEST10 failed, TEST=check FS=non-pmem BUILD=debug root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test# ndctl --version 61.1
On 07/04/2018 12:45 PM, Lukasz Dorau wrote:
Please test the latest stable release v61.2 and check if the results are the same.
Could you test the stable kernel? The latest stable is 4.17.4.
It doesn't work
pmempool_create/TEST9: SETUP (check/non-pmem/static-debug) pmempool_create/TEST9: PASS pmempool_create/TEST9: SETUP (check/non-pmem/static-nondebug) pmempool_create/TEST9: PASS pmempool_create/TEST10: SETUP (check/non-pmem/debug) Error: ndctl failed to inject or retain bad blocks RUNTESTS: stopping: pmempool_create/TEST10 failed, TEST=check FS=non-pmem BUILD=debug root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test# ndctl --version 60.1 root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test# cat pmempool_create/pmempool10.log
Thanks. I will test ndctl v61.1 with kernel v4.17.4 and v4.18-rc3.
One remark. I see you have ndctl version v60.1. Please test kernel v4.17.4 with ndctl v61.1
kernel v4.17.4 + ndctl v61.1
root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test/pmempool_create# cat pmem10.log
<libpmem>: <1> [out.c:236 out_init] pid 24423: program: /lkp/benchmarks/nvml-unit-tests/src/tools/pmempool/pmempool
<libpmem>: <1> [out.c:238 out_init] libpmem version 1.1
<libpmem>: <1> [out.c:242 out_init] src version: 1.4-rc4-606-ge9b36bc73846
<libpmem>: <1> [out.c:250 out_init] compiled with support for Valgrind pmemcheck
<libpmem>: <1> [out.c:255 out_init] compiled with support for Valgrind helgrind
<libpmem>: <1> [out.c:260 out_init] compiled with support for Valgrind memcheck
<libpmem>: <1> [out.c:265 out_init] compiled with support for Valgrind drd
<libpmem>: <3> [mmap.c:66 util_mmap_init]
<libpmem>: <3> [libpmem.c:56 libpmem_init]
<libpmem>: <3> [pmem.c:712 pmem_init]
<libpmem>: <3> [init.c:419 pmem_init_funcs]
<libpmem>: <3> [init.c:368 pmem_cpuinfo_to_funcs]
<libpmem>: <3> [init.c:372 pmem_cpuinfo_to_funcs] clflush supported
<libpmem>: <3> [init.c:281 use_avx_memcpy_memset] avx supported
<libpmem>: <3> [init.c:285 use_avx_memcpy_memset] PMEM_AVX not set or not == 1
<libpmem>: <3> [pmem.c:216 pmem_has_auto_flush]
<libpmem>: <3> [os_auto_flush_linux.c:106 check_domain_in_region] region_path: /sys/bus/nd/devices/region6
<libpmem>: <3> [init.c:472 pmem_init_funcs] Flushing CPU cache
<libpmem>: <3> [init.c:487 pmem_init_funcs] using clflush
<libpmem>: <3> [init.c:501 pmem_init_funcs] using movnt SSE2
<libpmem>: <3> [pmem_posix.c:104 pmem_os_init]
<libpmem>: <3> [libpmem.c:69 libpmem_fini]
<libpmem>: <3> [mmap.c:100 util_mmap_fini]
root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test/pmempool_create# cat pmempool10.log
<libpmempool>: <1> [out.c:236 out_init] pid 24423: program: /lkp/benchmarks/nvml-unit-tests/src/tools/pmempool/pmempool
<libpmempool>: <1> [out.c:238 out_init] libpmempool version 1.3
<libpmempool>: <1> [out.c:242 out_init] src version: 1.4-rc4-606-ge9b36bc73846
<libpmempool>: <1> [out.c:250 out_init] compiled with support for Valgrind pmemcheck
<libpmempool>: <1> [out.c:255 out_init] compiled with support for Valgrind helgrind
<libpmempool>: <1> [out.c:260 out_init] compiled with support for Valgrind memcheck
<libpmempool>: <1> [out.c:265 out_init] compiled with support for Valgrind drd
<libpmempool>: <3> [mmap.c:66 util_mmap_init]
<libpmempool>: <3> [libpmempool.c:69 libpmempool_init]
<libpmempool>: <3> [set.c:121 util_remote_init]
<libpmempool>: <3> [libpmempool.c:85 libpmempool_fini]
<libpmempool>: <3> [set.c:191 util_remote_unload]
<libpmempool>: <3> [set.c:136 util_remote_fini]
<libpmempool>: <3> [set.c:191 util_remote_unload]
<libpmempool>: <3> [mmap.c:100 util_mmap_fini]
root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test/pmempool_create# ndctl --version
61.1
root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test/pmempool_create# uname -a
Linux lkp-hsw-ep4 4.17.4 #1 SMP Wed Jul 4 12:46:41 CST 2018 x86_64 GNU/Linux
root@lkp-hsw-ep4 /lkp/benchmarks/nvml-unit-tests/src/test/pmempool_create# cat prep10.log
disabled 0 regions
disabled 8 regions
zeroed 6 nmems
enabled 8 regions
{
"dev":"namespace0.0",
"mode":"devdax",
"map":"dev",
"size":29364224,
"uuid":"0e376ac8-369d-42bc-a20f-4e58e0970a54",
"raw_uuid":"55d81f22-2497-4f5b-b906-7ebe1529913f",
"chardev":"dax0.0"
}
I confirm that injecting bad blocks in the nfit_test
module does not work with kernel v4.17.4 + ndctl v61.1. So this is an external bug. I will submit a bug report.
Please use kernel v4.16 as a workaround for this bug.
Great, it works for me on v4.16
pmempool_create/TEST9: PASS
pmempool_create/TEST9: SETUP (check/non-pmem/static-debug)
pmempool_create/TEST9: PASS
pmempool_create/TEST9: SETUP (check/non-pmem/static-nondebug)
pmempool_create/TEST9: PASS
pmempool_create/TEST10: SETUP (check/non-pmem/debug)
pmempool_create/TEST10: PASS
pmempool_create/TEST10: SETUP (check/non-pmem/nondebug)
pmempool_create/TEST10: PASS
pmempool_create/TEST11: SETUP (check/non-pmem/debug)
pmempool_create/TEST11: PASS
pmempool_create/TEST11: SETUP (check/non-pmem/nondebug)
pmempool_create/TEST11: PASS
pmempool_create/TEST12: SETUP (check/non-pmem/debug)
pmempool_create/TEST12: PASS
pmempool_create/TEST12: SETUP (check/non-pmem/nondebug)
pmempool_create/TEST12: PASS
Sorry about the thrash. We overhauled ARS handling between 4.16 and 4.17 and one of the casualties was "inject-error --notify" support in nfit_test. Some more details here .
For now, you need to run ndctl start-scrub; ndctl wait-scrub; after injecting errors on nfit_test to get them to appear in the badblocks. We're investigating how to restore "--notify" support, but it might not be implemented for one or more releases.
@zhijianli88 The commit https://github.com/pmem/pmdk/commit/439d0d0ce0d646097e1a0664c855e5d4bc9f84a8 has been merged upstream. Please verify and close this issue if it is fixed.
verified
nvml-unit-tests.pmempool_create_TEST10_non-pmem_debug.fail occurs on 0Day since we built nvml with NDCTL_ENABLE=y(previous it is unset) recently. after digging into the test case, we found it exits from line 70 below at v4.17-rc5
Did you meet the similar issues. if i miss something at buiding or runing test, please let me know. if you need more details/log, let me know again