pmem / pmdk

Persistent Memory Development Kit
https://pmem.io
Other
1.34k stars 510 forks source link

pmem2_mem_ext failure because of Valigirnd not supporting avx512f. #5640

Open grom72 opened 1 year ago

grom72 commented 1 year ago

ISSUE: pmem2_mem_ext/TEST[1-4]: failed under valgrind (self-hosted, rhel, RUNTESTS.py --force-enable pmemcheck)

Environment Information

Please provide a reproduction of the bug:

pmem2_mem_ext/TEST1: failed under valgrind (self-hosted, rhel, RUNTESTS.py --force-enable pmemcheck) with error message:

pmem2_mem_ext/TEST1: FAILED (short/debug/pmemcheck/page/wc_workaround: on/variant: avx512f)
Pattern: memmove_mov_avx512f occurs 0 times. One expected. Type: C Flag id: 0
pmem2_mem_ext/TEST1: FAILED (short/debug/pmemcheck/page/wc_workaround: off/variant: avx512f)
Pattern: memmove_mov_avx512f occurs 0 times. One expected. Type: C Flag id: 0
pmem2_mem_ext/TEST1: FAILED (short/debug/pmemcheck/page/wc_workaround: default/variant: avx512f)
Pattern: memmove_mov_avx512f occurs 0 times. One expected. Type: C Flag id: 0

## How often bug is revealed: (always, often, rare):  always

<!-- describe special circumstances in section above -->

## Actual behavior:

pmem2_mem_ext/TEST4: SETUP (short/debug/pmemcheck/byte/wc_workaround: off/variant: avx512f) Last 9 lines of /home/test-user/actions-runner/_work/pmdk/pmdk/src/test/pmem2_mem_ext/pmemcheck4.log below (whole file has 9 lines): ==1341046== pmemcheck-1.0, a simple persistent store checker ==1341046== Copyright (c) 2014-2020, Intel Corporation ==1341046== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info ==1341046== Command: /home/test-user/actions-runner/_work/pmdk/pmdk/src/test/pmem2_mem_ext/pmem2_mem_ext /mnt/pmem0/pmem2_mem_ext_4/testfile C 1024 0 ==1341046== Parent PID: 1320522 ==1341046== ==1341046== ==1341046== Number of stores not made persistent: 0 ==1341046== ERROR SUMMARY: 0 errors Last 30 lines of /home/test-user/actions-runner/_work/pmdk/pmdk/src/test/pmem2_mem_ext/pmem2_4.log below (whole file has 55 lines):

: <3> [init.c:597 pmem2_arch_init] using movnt SSE2 : <3> [map_posix.c:293 pmem2_map_new] cfg 0x4232f50 src 0x4232fa0 map_ptr 0x1ffefffa40 : <3> [source_posix.c:143 pmem2_source_alignment] type 2 : <4> [source_posix.c:173 pmem2_source_alignment] alignment 4096 : <3> [source_posix.c:92 pmem2_source_size] type 2 : <4> [source_posix.c:132 pmem2_source_size] file length 4194304 : <4> [map_posix.c:140 map_reserve] system choice 0x7e09000 : <4> [map_posix.c:149 map_reserve] hint 0x8000000 : <15> [map_posix.c:204 file_map] reserve 0x8000000 len 4194304 proto 3 flags 10 fd 8 offset 0 map_sync 0x1ffefff97f : <4> [map_posix.c:228 file_map] mmap with MAP_SYNC succeeded : <3> [map_posix.c:477 pmem2_map_new] mapped at 0x8000000 : <15> [auto_flush_linux.c:140 pmem2_auto_flush] : <15> [auto_flush_linux.c:175 pmem2_auto_flush] Start traversing region: /sys/bus/nd/devices/region0 : <3> [auto_flush_linux.c:86 check_domain_in_region] region_path: /sys/bus/nd/devices/region0 : <3> [auto_flush_linux.c:30 check_cpu_cache] domain_path: /sys/bus/nd/devices/region0/persistence_domain : <15> [auto_flush_linux.c:64 check_cpu_cache] detected persistent_domain: memory_controller : <15> [auto_flush_linux.c:69 check_cpu_cache] cpu_cache not in persistent_domain: /sys/bus/nd/devices/region0/persistence_domain : <3> [map_posix.c:522 pmem2_map_new] using libpmem2 default async mover : <3> [mover.c:181 mover_new] map 0x42344f0, vdm 0x1ffefff958 : <6> [ravl.c:395 ravl_emplace] : <3> [map.c:44 pmem2_map_get_size] map 0x42344f0 : <3> [map.c:32 pmem2_map_get_address] map 0x42344f0 : <15> [memcpy_t_sse2.c:218 memmove_mov_sse2_empty] dest 0x8000400 src 0x8000000 len 1024 : <15> [persist.c:125 pmem2_drain] : <15> [init.c:26 memory_barrier] : <3> [map_posix.c:580 pmem2_map_delete] map_ptr 0x1ffefffa40 : <6> [ravl.c:526 ravl_find] : <6> [ravl.c:526 ravl_find] : <6> [ravl.c:547 ravl_remove] : <3> [libpmem2.c:44 libpmem2_fini] Last 3 lines of /home/test-user/actions-runner/_work/pmdk/pmdk/src/test/pmem2_mem_ext/out4.log below (whole file has 3 lines): pmem2_mem_ext/TEST4: START: pmem2_mem_ext default avx avx512f movdir64b /home/test-user/actions-runner/_work/pmdk/pmdk/src/test/pmem2_mem_ext/pmem2_mem_ext /mnt/pmem0/pmem2_mem_ext_4/testfile C 1024 0 pmem2_mem_ext/TEST4: DONE Last 0 lines of /home/test-user/actions-runner/_work/pmdk/pmdk/src/test/pmem2_mem_ext/err4.log below (whole file has 0 lines): Last 3 lines of /home/test-user/actions-runner/_work/pmdk/pmdk/src/test/pmem2_mem_ext/trace4.log below (whole file has 3 lines): {pmem2_mem_ext.c:86 main} pmem2_mem_ext/TEST4: START: pmem2_mem_ext default avx avx512f movdir64b /home/test-user/actions-runner/_work/pmdk/pmdk/src/test/pmem2_mem_ext/pmem2_mem_ext /mnt/pmem0/pmem2_mem_ext_4/testfile C 1024 0 {pmem2_mem_ext.c:143 main} pmem2_mem_ext/TEST4: DONE pmem2_mem_ext/TEST4: FAILED (short/debug/pmemcheck/byte/wc_workaround: off/variant: avx512f) Pattern: memmove_mov_avx512f occurs 0 times. One expected. Type: C Flag id: 0 pmem2_mem_ext/TEST4: SETUP (short/debug/pmemcheck/byte/wc_workaround: default/variant: sse2) pmem2_mem_ext/TEST4: PASS [06.076 s] pmem2_mem_ext/TEST4: SETUP (short/debug/pmemcheck/byte/wc_workaround: default/variant: avx) pmem2_mem_ext/TEST4: PASS [06.150 s] pmem2_mem_ext/TEST4: SETUP (short/debug/pmemcheck/byte/wc_workaround: default/variant: avx512f) Last 9 lines of /home/test-user/actions-runner/_work/pmdk/pmdk/src/test/pmem2_mem_ext/pmemcheck4.log below (whole file has 9 lines): ==1341106== pmemcheck-1.0, a simple persistent store checker ==1341106== Copyright (c) 2014-2020, Intel Corporation ==1341106== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info ==1341106== Command: /home/test-user/actions-runner/_work/pmdk/pmdk/src/test/pmem2_mem_ext/pmem2_mem_ext /mnt/pmem0/pmem2_mem_ext_4/testfile C 1024 0 ==1341106== Parent PID: 1320522 ==1341106== ==1341106== ==1341106== Number of stores not made persistent: 0 ==1341106== ERROR SUMMARY: 0 errors Last 30 lines of /home/test-user/actions-runner/_work/pmdk/pmdk/src/test/pmem2_mem_ext/pmem2_4.log below (whole file has 54 lines): : <3> [init.c:597 pmem2_arch_init] using movnt SSE2 : <3> [map_posix.c:293 pmem2_map_new] cfg 0x4232f50 src 0x4232fa0 map_ptr 0x1ffefffa60 : <3> [source_posix.c:143 pmem2_source_alignment] type 2 : <4> [source_posix.c:173 pmem2_source_alignment] alignment 4096 : <3> [source_posix.c:92 pmem2_source_size] type 2 : <4> [source_posix.c:132 pmem2_source_size] file length 4194304 : <4> [map_posix.c:140 map_reserve] system choice 0x7e09000 : <4> [map_posix.c:149 map_reserve] hint 0x8000000 : <15> [map_posix.c:204 file_map] reserve 0x8000000 len 4194304 proto 3 flags 10 fd 8 offset 0 map_sync 0x1ffefff99f : <4> [map_posix.c:228 file_map] mmap with MAP_SYNC succeeded : <3> [map_posix.c:477 pmem2_map_new] mapped at 0x8000000 : <15> [auto_flush_linux.c:140 pmem2_auto_flush] : <15> [auto_flush_linux.c:175 pmem2_auto_flush] Start traversing region: /sys/bus/nd/devices/region0 : <3> [auto_flush_linux.c:86 check_domain_in_region] region_path: /sys/bus/nd/devices/region0 : <3> [auto_flush_linux.c:30 check_cpu_cache] domain_path: /sys/bus/nd/devices/region0/persistence_domain : <15> [auto_flush_linux.c:64 check_cpu_cache] detected persistent_domain: memory_controller : <15> [auto_flush_linux.c:69 check_cpu_cache] cpu_cache not in persistent_domain: /sys/bus/nd/devices/region0/persistence_domain : <3> [map_posix.c:522 pmem2_map_new] using libpmem2 default async mover : <3> [mover.c:181 mover_new] map 0x42344f0, vdm 0x1ffefff978 : <6> [ravl.c:395 ravl_emplace] : <3> [map.c:44 pmem2_map_get_size] map 0x42344f0 : <3> [map.c:32 pmem2_map_get_address] map 0x42344f0 : <15> [memcpy_t_sse2.c:218 memmove_mov_sse2_empty] dest 0x8000400 src 0x8000000 len 1024 : <15> [persist.c:125 pmem2_drain] : <15> [init.c:26 memory_barrier] : <3> [map_posix.c:580 pmem2_map_delete] map_ptr 0x1ffefffa60 : <6> [ravl.c:526 ravl_find] : <6> [ravl.c:526 ravl_find] : <6> [ravl.c:547 ravl_remove] : <3> [libpmem2.c:44 libpmem2_fini] Last 3 lines of /home/test-user/actions-runner/_work/pmdk/pmdk/src/test/pmem2_mem_ext/out4.log below (whole file has 3 lines): pmem2_mem_ext/TEST4: START: pmem2_mem_ext default avx avx512f movdir64b /home/test-user/actions-runner/_work/pmdk/pmdk/src/test/pmem2_mem_ext/pmem2_mem_ext /mnt/pmem0/pmem2_mem_ext_4/testfile C 1024 0 pmem2_mem_ext/TEST4: DONE Last 0 lines of /home/test-user/actions-runner/_work/pmdk/pmdk/src/test/pmem2_mem_ext/err4.log below (whole file has 0 lines): Last 3 lines of /home/test-user/actions-runner/_work/pmdk/pmdk/src/test/pmem2_mem_ext/trace4.log below (whole file has 3 lines): {pmem2_mem_ext.c:86 main} pmem2_mem_ext/TEST4: START: pmem2_mem_ext default avx avx512f movdir64b /home/test-user/actions-runner/_work/pmdk/pmdk/src/test/pmem2_mem_ext/pmem2_mem_ext /mnt/pmem0/pmem2_mem_ext_4/testfile C 1024 0 {pmem2_mem_ext.c:143 main} pmem2_mem_ext/TEST4: DONE pmem2_mem_ext/TEST4: FAILED (short/debug/pmemcheck/byte/wc_workaround: default/variant: avx512f) ``` ## Expected behavior: ## Details ## Additional information about Priority and Help Requested: Are you willing to submit a pull request with a proposed change? (Yes, No) Requested priority: (Showstopper, High, Medium, Low)
grom72 commented 1 year ago

See https://github.com/pmem/pmdk/actions/runs/4969298952/jobs/8892382347#step:6:4640 This test should skip as it does for MOVDIR64B

pmem2_mem_ext/TEST4: FAILED (short/debug/pmemcheck/byte/wc_workaround: default/variant: avx512f)
Pattern: memmove_mov_avx512f occurs 0 times. One expected. Type: C Flag id: 0
pmem2_mem_ext/TEST5: SETUP  (short/debug/pmemcheck/byte/wc_workaround: on/variant: movdir64b)
pmem2_mem_ext/TEST5: SKIP: MOVDIR64B unavailable
pmem2_mem_ext/TEST5: SETUP  (short/debug/pmemcheck/byte/wc_workaround: off/variant: movdir64b)
pmem2_mem_ext/TEST5: SKIP: MOVDIR64B unavailable
grom72 commented 1 year ago

See #4715 ;)

grom72 commented 1 year ago

It looks like Valgrind does not support avx512f. is_cpu_feature_present(0x7, EBX_IDX, bit_AVX512F); returns 0 under Valigring but 1 when the function is called directly on CPU. In the case of the pmem2_mem_ext test setup, a program cpufd is used to determine if avx512f is available. This program uses directly cpu.c source code without any Valgrind instrumentation and detects properly avx512f availability. Inside test pmem2_mem_ext is called under Valgrind and libpmem2 library initialization does not detect avx512f support properly.

<libpmem2>: <1> [out.c:209 out_init] pid 1572171: program: /home/tgromadz/repos/pmdk/src/test/pmem2_mem_ext/pmem2_mem_ext
<libpmem2>: <1> [out.c:211 out_init] libpmem2 version 0.0
<libpmem2>: <1> [out.c:215 out_init] src version: 1.13.0+git36.g850a09941
<libpmem2>: <1> [out.c:223 out_init] compiled with support for Valgrind pmemcheck
<libpmem2>: <1> [out.c:228 out_init] compiled with support for Valgrind helgrind
<libpmem2>: <1> [out.c:233 out_init] compiled with support for Valgrind memcheck
<libpmem2>: <1> [out.c:238 out_init] compiled with support for Valgrind drd
<libpmem2>: <1> [out.c:243 out_init] compiled with support for shutdown state
<libpmem2>: <1> [out.c:248 out_init] compiled with libndctl 63+
<libpmem2>: <3> [libpmem2.c:29 libpmem2_init] 
<libpmem2>: <3> [init.c:560 pmem2_arch_init] 
<libpmem2>: <3> [init.c:472 pmem_cpuinfo_to_funcs] 
<libpmem2>: <4> [cpu.c:135 is_cpu_clflush_present] CLFLUSH supported
<libpmem2>: <3> [init.c:475 pmem_cpuinfo_to_funcs] clflush supported
<libpmem2>: <4> [cpu.c:147 is_cpu_clflushopt_present] CLFLUSHOPT not supported
<libpmem2>: <4> [cpu.c:159 is_cpu_clwb_present] CLWB not supported
<libpmem2>: <4> [cpu.c:123 is_cpu_genuine_intel] CPU vendor: GenuineIntel
<libpmem2>: <3> [init.c:517 pmem_cpuinfo_to_funcs] WC workaround forced to 1
<libpmem2>: <3> [init.c:527 pmem_cpuinfo_to_funcs] WC workaround = 1
<libpmem2>: <4> [cpu.c:171 is_cpu_avx_present] AVX supported
<libpmem2>: <3> [init.c:272 use_avx_memcpy_memset] avx supported
<libpmem2>: <3> [init.c:276 use_avx_memcpy_memset] PMEM_AVX set to 0
<libpmem2>: <4> [cpu.c:183 is_cpu_avx512f_present] AVX512f not supported
<libpmem2>: <4> [cpu.c:196 is_cpu_movdir64b_present] movdir64b not supported
<libpmem2>: <3> [init.c:588 pmem2_arch_init] using clflush
<libpmem2>: <3> [init.c:599 pmem2_arch_init] using movnt SSE2
<libpmem2>: <3> [map_posix.c:293 pmem2_map_new] cfg 0x8032d30 src 0x8032db0 map_ptr 0x1ffeffec50

Below is the fragment of the log from calling pmem2_mem_ext directly from the command line:

<libpmem2>: <1> [out.c:209 out_init] pid 1569697: program: /home/tgromadz/repos/pmdk/src/test/pmem2_mem_ext/pmem2_mem_ext
<libpmem2>: <1> [out.c:211 out_init] libpmem2 version 0.0
<libpmem2>: <1> [out.c:215 out_init] src version: 1.13.0+git36.g850a09941
<libpmem2>: <1> [out.c:223 out_init] compiled with support for Valgrind pmemcheck
<libpmem2>: <1> [out.c:228 out_init] compiled with support for Valgrind helgrind
<libpmem2>: <1> [out.c:233 out_init] compiled with support for Valgrind memcheck
<libpmem2>: <1> [out.c:238 out_init] compiled with support for Valgrind drd
<libpmem2>: <1> [out.c:243 out_init] compiled with support for shutdown state
<libpmem2>: <1> [out.c:248 out_init] compiled with libndctl 63+
<libpmem2>: <3> [libpmem2.c:29 libpmem2_init] 
<libpmem2>: <3> [init.c:560 pmem2_arch_init] 
<libpmem2>: <3> [init.c:472 pmem_cpuinfo_to_funcs] 
<libpmem2>: <4> [cpu.c:135 is_cpu_clflush_present] CLFLUSH supported
<libpmem2>: <3> [init.c:475 pmem_cpuinfo_to_funcs] clflush supported
<libpmem2>: <4> [cpu.c:147 is_cpu_clflushopt_present] CLFLUSHOPT supported
<libpmem2>: <3> [init.c:483 pmem_cpuinfo_to_funcs] clflushopt supported
<libpmem2>: <4> [cpu.c:159 is_cpu_clwb_present] CLWB supported
<libpmem2>: <3> [init.c:496 pmem_cpuinfo_to_funcs] clwb supported
<libpmem2>: <4> [cpu.c:123 is_cpu_genuine_intel] CPU vendor: GenuineIntel
<libpmem2>: <3> [init.c:527 pmem_cpuinfo_to_funcs] WC workaround = 1
<libpmem2>: <4> [cpu.c:171 is_cpu_avx_present] AVX supported
<libpmem2>: <3> [init.c:272 use_avx_memcpy_memset] avx supported
<libpmem2>: <3> [init.c:280 use_avx_memcpy_memset] PMEM_AVX enabled
<libpmem2>: <4> [cpu.c:183 is_cpu_avx512f_present] AVX512f supported
<libpmem2>: <3> [init.c:372 use_avx512f_memcpy_memset] avx512f supported
<libpmem2>: <3> [init.c:380 use_avx512f_memcpy_memset] PMEM_AVX512F enabled
<libpmem2>: <4> [cpu.c:196 is_cpu_movdir64b_present] movdir64b not supported
<libpmem2>: <3> [init.c:584 pmem2_arch_init] using clwb
<libpmem2>: <3> [init.c:595 pmem2_arch_init] using movnt AVX512F

For that reason, the test will be disabled until Valgrind will properly support avx512f. See https://bugs.kde.org/show_bug.cgi?id=383010 as a reference.