intel / mOS

Other
133 stars 28 forks source link

Can I use yod to mmap for a memory larger than 128M? #7

Open HengInWeb opened 3 years ago

HengInWeb commented 3 years ago

As far as I know, the system physical memory block size is 128M, but why can yod only allocate one memory block to mmap? Is there any other way to make yod allocate more memory to mmap?

My test method is as follows: yod -o lwkmem-prot-none-delegation-enable -M 2048m -C all --mosview lwk ./maptest --verbose --type anonymous --size 4096 --num 32512

Test Results Failed mmap. Cannot allocate memory

rolfriesen commented 3 years ago

Hello @HengInWeb sorry for the slow response, I did not have watch turned on. I am not sure I understand your question. The 128 MB memory block size is given by Linux and is the smallest unit that can be off and on-lined. mOS uses that to designate memory for the LWK. Assuming you booted with a large enough lwkmem= option, or requested enough memory using lwkctl, then you would have gotten many 128 MB blocks. Can you share what your "lwkctl -s" looks like? What does your maptest program do? Mmap 4096 * 32512 bytes? If there is enough memory designated (visible using lwkctl -s) then your yod reservation of -M 2048m should have worked. Does "yod echo hello" work? What about "yod -M 2048m -C all echo hello"? Feel free to use mos-devel@googlegroups.com for a more responsive conversation.

HengInWeb commented 3 years ago

boot cmdline: cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-5.4.48-mos-build20210617-1 root=UUID=93cb47c5-3e8a-467a-a61d-3ed6d74ee889 ro rhgb movable_node lwkcpu_profile=normal lwkmem=5G kernelcore=5G lwkcpus=1.4-7 loglevel=7 console=tty1 intel_pstate=disable lwkmem_debug=1 selinux=0 nmi_watchdog=0 nohz_full=1 LANG=en_US.UTF-8

lwkctl -s mOS version : 0.8 Linux CPU(s): 0-3 [ 4 CPU(s) ] LWK CPU(s): 4-7 [ 4 CPU(s) ] Utility CPU(s): 1 [ 1 CPU(s) ] LWK Memory(KB): 1835008 3407872 [ 2 NUMA nodes ]

maptest program https://github.com/intel/mOS/blob/master/mOS/tests/lwkmem/maptest.c

mosview -s lwk free -m total used free shared buff/cache available Mem: 5120 0 5120 0 0 5120 Swap: 0 0 0 mosview -s linux free -m total used free shared buff/cache available Mem: 6883 146 6559 11 177 6579 Swap: 199 0 199 mosview -s all free -m total used free shared buff/cache available Mem: 12003 146 11679 11 177 11699 Swap: 199 0 199

dmesg | grep -i mos [ 0.000000] Linux version 5.4.48-mos-build20210617-1 (root@mos10-16-1-159) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)) #1 SMP Thu Jun 17 11:04:25 CST 2021 [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.4.48-mos-build20210617-1 root=UUID=93cb47c5-3e8a-467a-a61d-3ed6d74ee889 ro rhgb movable_node lwkcpu_profile=normal lwkmem=5G kernelcore=5G lwkcpus=1.4-7 loglevel=7 console=tty1 intel_pstate=disable lwkmem_debug=1 selinux=0 nmi_watchdog=0 nohz_full=1 LANG=en_US.UTF-8 [ 0.270265] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.4.48-mos-build20210617-1 root=UUID=93cb47c5-3e8a-467a-a61d-3ed6d74ee889 ro rhgb movable_node lwkcpu_profile=normal lwkmem=5G kernelcore=5G lwkcpus=1.4-7 loglevel=7 console=tty1 intel_pstate=disable lwkmem_debug=1 selinux=0 nmi_watchdog=0 nohz_full=1 LANG=en_US.UTF-8 [ 2.512669] usb usb1: Manufacturer: Linux 5.4.48-mos-build20210617-1 ehci_hcd [ 2.562102] usb usb2: Manufacturer: Linux 5.4.48-mos-build20210617-1 uhci_hcd [ 2.608309] usb usb3: Manufacturer: Linux 5.4.48-mos-build20210617-1 uhci_hcd [ 2.654760] usb usb4: Manufacturer: Linux 5.4.48-mos-build20210617-1 uhci_hcd [ 2.671007] rtc_cmos 00:00: RTC can wake from S4 [ 2.673691] rtc_cmos 00:00: registered as rtc0 [ 2.676533] rtc_cmos 00:00: alarms up to one day, y3k, 114 bytes nvram [ 2.770712] rtc_cmos 00:00: setting system clock to 2021-06-29T02:49:41 UTC (1624934981) [ 2.804171] mOS-lwkctl: Creating default memory partition: lwkmem=5G [ 2.805477] mOS-mem: Initializing memory management. precise=no [ 2.835214] mOS-mem: Node 0: va 0x(ptrval) pa 0x148000000 pfn 1343488-1802239 : 458752 [ 2.837463] mOS-mem: Node 0: offlining va 0x(ptrval) pa 0x148000000 pfn 1343488-1802239:458752 [ 4.691072] mOS-mem: Unallocated 805306368 bytes req to node(s):1 [ 4.737079] mOS-mem: Node 1: va 0x(ptrval) pa 0x208000000 pfn 2129920-2981887 : 851968 [ 4.738908] mOS-mem: Node 1: offlining va 0x(ptrval) pa 0x208000000 pfn 2129920-2981887:851968 [ 9.459123] mOS-mem: Node 1: Requested 3328 MB Allocated 3328 MB [ 9.460195] mOS-mem: Requested 5120 MB Allocated 5120 MB [ 9.475957] mOS-lwkctl: LWK creating default LWKMEM partition..Done [ 9.476993] mOS-lwkctl: Creating default CPU partition: [ 9.476994] mOS-lwkctl: lwkcpu_profile=normal [ 9.479754] mOS: LWK CPUs 4-7 will ship syscalls to Linux CPU 1 [ 9.480886] mOS: Configured LWK CPUs: 4-7 [ 9.481919] mOS: Configured Utility CPUs: 1 [ 9.482973] mOS: LWK CPU profile set to: normal [ 9.484231] mOS-sched: set unbound workqueue cpumask to 0-3 [ 9.485328] mOS: MWAIT not supported by processor. IDLE HALT enabled. [ 9.600290] mOS-lwkctl: LWK creating default partition.. Done

run yod for yod -o lwkmem-prot-none-delegation-enable -M 2048m -C all --mosview lwk ./maptest --verbose --type anonymous --size 4096 --num 32512 result is: Failed mmap. Cannot allocate memory

dmesg output [ 49.616869] mOS-ras: msg="unmap_pagetbl: PMD not present for address:7fffffe00000 size:4k" id=mOSLwkmemProcessWarning location= jobid= [ 49.616977] mOS-ras: msg="unmap_pagetbl: PMD not present for address:7fffffe01000 size:4k" id=mOSLwkmemProcessWarning location= jobid= [ 49.617039] mOS-ras: msg="unmap_pagetbl: PMD not present for address:7fffffe02000 size:4k" id=mOSLwkmemProcessWarning location= jobid= [ 49.617102] mOS-ras: msg="unmap_pagetbl: PMD not present for address:7fffffe03000 size:4k" id=mOSLwkmemProcessWarning location= jobid= [ 49.617163] mOS-ras: msg="unmap_pagetbl: PMD not present for address:7fffffe04000 size:4k" id=mOSLwkmemProcessWarning location= jobid=

run command yod -M 2048m -C all echo hello is work

rolfriesen commented 3 years ago

Thanks @HengInWeb for the info; I didn't realize you were running one of our test programs ;-) Try without the -o lwkmem-prot-none-delegation-enable command line argument. It is almost never needed and certainly, maptest does not need it. Also, unless you are using the -M option for another reason, you could just run like this:

yod ./maptest --type anonymous --size 4096 --num 32512

HengInWeb commented 3 years ago

@rolfriesen I tried to re-run the test program according to your method, but the result is exactly the same. From the error dmesg, the problem can be located. The problem is about the fulsh TLB operation in the unmap_pagetbl function of the mem_mgmt.c module, but I cannot understand the specific details. run yod ./maptest --type anonymous --size 4096 --num 32512 Return the same result:Failed mmap. Cannot allocate memory dmesg error output: [76439.668285] mOS-ras: msg="unmap_pagetbl: PMD not present for address:7fffffedc000 size:4k" id=mOSLwkmemProcessWarning location= jobid= [76439.672344] mOS-ras: msg="unmap_pagetbl: PMD not present for address:7fffffeda000 size:4k" id=mOSLwkmemProcessWarning location= jobid= [76439.676469] mOS-ras: msg="unmap_pagetbl: PMD not present for address:7fffffed8000 size:4k" id=mOSLwkmemProcessWarning location= jobid= [76439.680542] mOS-ras: msg="unmap_pagetbl: PMD not present for address:7fffffed6000 size:4k" id=mOSLwkmemProcessWarning location= jobid= [76439.684592] mOS-ras: msg="unmap_pagetbl: PMD not present for address:7fffffed4000 size:4k" id=mOSLwkmemProcessWarning location= jobid= [76439.688571] mOS-ras: msg="unmap_pagetbl: PMD not present for address:7fffffed2000 size:4k" id=mOSLwkmemProcessWarning location= jobid= [76439.693052] mOS-ras: msg="unmap_pagetbl: PMD not present for address:7fffffed0000 size:4k" id=mOSLwkmemProcessWarning location= jobid=

rolfriesen commented 3 years ago

Thanks for trying @HengInWeb. Here are a few more suggestions and questions for you.

Please try setting "ulimit -s unlimited" before you run the test.

We have now repeated your experiment with the same kernel you are using but were not able to replicate it. This may be because our hardware is different. What kind of system are you running on? Could you send the output from "lscpu" and "numactl -H", please?

From the information you have already sent, it seems you have hyperthreading disabled. Is that true?

On your boot command line, you have nohz_full=1 In your case this should be nohz_full=4-7 That list should always be the same as the LWK CPus in the lwkcpus list. The idea is that we keep the LWK CPUs (4-7 in your example) as noise-free as possible. and let the Linux CPUs (0-3, including the system call CPU 1) handle interrupts and timer ticks that cause noise.

These things should not have any impact on the failure you are seeing. We are just trying to understand your system and setup better and see if we can figure out what the problem is.

HengInWeb commented 3 years ago

setting "ulimit -s unlimited" and then run yod to successfully allocate more than 128M of memory! Thank you @rolfriesen for your guidance and help, Thanks !

My previous operating environment was a KVM virtual machine, and the virtual machine was configured with vNUMA. The results of running lscpu and numactl in the virtual machine are as follows: lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel Xeon Processor (Skylake) Stepping: 4 CPU MHz: 2100.000 BogoMIPS: 4200.00 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 4096K L3 cache: 16384K NUMA node0 CPU(s): 0-3 NUMA node1 CPU(s): 4-7 Flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat [root@mos10-16-1-159 ~]# numactl -H available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 node 0 size: 5982 MB node 0 free: 5650 MB node 1 cpus: 4 5 6 7 node 1 size: 6020 MB node 1 free: 5992 MB node distances: node 0 1 0: 10 20 1: 20 10

Our idea is to run the KVM virtual machine in LWK to verify whether the CPU and memory isolation can reduce the impact of the host operating system on the virtual machine, thereby improving the performance and real-time performance of the virtual machine

rolfriesen commented 3 years ago

@HengInWeb I am glad this worked. I had help from the mOS team in answering this question. We have thought about running a virtual machine in the LWK partition of mOS as well and I think there could be benefits doing that, but we have not had time to explore that. Good luck and let us know if you can share interesting results.

heyzzqq0103 commented 2 years ago

boot cmdline: cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-5.4.48-mos-build20210617-1 root=UUID=93cb47c5-3e8a-467a-a61d-3ed6d74ee889 ro rhgb movable_node lwkcpu_profile=normal lwkmem=5G kernelcore=5G lwkcpus=1.4-7 loglevel=7 console=tty1 intel_pstate=disable lwkmem_debug=1 selinux=0 nmi_watchdog=0 nohz_full=1 LANG=en_US.UTF-8

lwkctl -s mOS version : 0.8 Linux CPU(s): 0-3 [ 4 CPU(s) ] LWK CPU(s): 4-7 [ 4 CPU(s) ] Utility CPU(s): 1 [ 1 CPU(s) ] LWK Memory(KB): 1835008 3407872 [ 2 NUMA nodes ]

maptest program https://github.com/intel/mOS/blob/master/mOS/tests/lwkmem/maptest.c

mosview -s lwk free -m total used free shared buff/cache available Mem: 5120 0 5120 0 0 5120 Swap: 0 0 0 mosview -s linux free -m total used free shared buff/cache available Mem: 6883 146 6559 11 177 6579 Swap: 199 0 199 mosview -s all free -m total used free shared buff/cache available Mem: 12003 146 11679 11 177 11699 Swap: 199 0 199

dmesg | grep -i mos [ 0.000000] Linux version 5.4.48-mos-build20210617-1 (root@mos10-16-1-159) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)) #1 SMP Thu Jun 17 11:04:25 CST 2021 [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.4.48-mos-build20210617-1 root=UUID=93cb47c5-3e8a-467a-a61d-3ed6d74ee889 ro rhgb movable_node lwkcpu_profile=normal lwkmem=5G kernelcore=5G lwkcpus=1.4-7 loglevel=7 console=tty1 intel_pstate=disable lwkmem_debug=1 selinux=0 nmi_watchdog=0 nohz_full=1 LANG=en_US.UTF-8 [ 0.270265] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.4.48-mos-build20210617-1 root=UUID=93cb47c5-3e8a-467a-a61d-3ed6d74ee889 ro rhgb movable_node lwkcpu_profile=normal lwkmem=5G kernelcore=5G lwkcpus=1.4-7 loglevel=7 console=tty1 intel_pstate=disable lwkmem_debug=1 selinux=0 nmi_watchdog=0 nohz_full=1 LANG=en_US.UTF-8 [ 2.512669] usb usb1: Manufacturer: Linux 5.4.48-mos-build20210617-1 ehci_hcd [ 2.562102] usb usb2: Manufacturer: Linux 5.4.48-mos-build20210617-1 uhci_hcd [ 2.608309] usb usb3: Manufacturer: Linux 5.4.48-mos-build20210617-1 uhci_hcd [ 2.654760] usb usb4: Manufacturer: Linux 5.4.48-mos-build20210617-1 uhci_hcd [ 2.671007] rtc_cmos 00:00: RTC can wake from S4 [ 2.673691] rtc_cmos 00:00: registered as rtc0 [ 2.676533] rtc_cmos 00:00: alarms up to one day, y3k, 114 bytes nvram [ 2.770712] rtc_cmos 00:00: setting system clock to 2021-06-29T02:49:41 UTC (1624934981) [ 2.804171] mOS-lwkctl: Creating default memory partition: lwkmem=5G [ 2.805477] mOS-mem: Initializing memory management. precise=no [ 2.835214] mOS-mem: Node 0: va 0x(ptrval) pa 0x148000000 pfn 1343488-1802239 : 458752 [ 2.837463] mOS-mem: Node 0: offlining va 0x(ptrval) pa 0x148000000 pfn 1343488-1802239:458752 [ 4.691072] mOS-mem: Unallocated 805306368 bytes req to node(s):1 [ 4.737079] mOS-mem: Node 1: va 0x(ptrval) pa 0x208000000 pfn 2129920-2981887 : 851968 [ 4.738908] mOS-mem: Node 1: offlining va 0x(ptrval) pa 0x208000000 pfn 2129920-2981887:851968 [ 9.459123] mOS-mem: Node 1: Requested 3328 MB Allocated 3328 MB [ 9.460195] mOS-mem: Requested 5120 MB Allocated 5120 MB [ 9.475957] mOS-lwkctl: LWK creating default LWKMEM partition..Done [ 9.476993] mOS-lwkctl: Creating default CPU partition: [ 9.476994] mOS-lwkctl: lwkcpu_profile=normal [ 9.479754] mOS: LWK CPUs 4-7 will ship syscalls to Linux CPU 1 [ 9.480886] mOS: Configured LWK CPUs: 4-7 [ 9.481919] mOS: Configured Utility CPUs: 1 [ 9.482973] mOS: LWK CPU profile set to: normal [ 9.484231] mOS-sched: set unbound workqueue cpumask to 0-3 [ 9.485328] mOS: MWAIT not supported by processor. IDLE HALT enabled. [ 9.600290] mOS-lwkctl: LWK creating default partition.. Done

run yod for yod -o lwkmem-prot-none-delegation-enable -M 2048m -C all --mosview lwk ./maptest --verbose --type anonymous --size 4096 --num 32512 result is: Failed mmap. Cannot allocate memory

dmesg output [ 49.616869] mOS-ras: msg="unmap_pagetbl: PMD not present for address:7fffffe00000 size:4k" id=mOSLwkmemProcessWarning location= jobid= [ 49.616977] mOS-ras: msg="unmap_pagetbl: PMD not present for address:7fffffe01000 size:4k" id=mOSLwkmemProcessWarning location= jobid= [ 49.617039] mOS-ras: msg="unmap_pagetbl: PMD not present for address:7fffffe02000 size:4k" id=mOSLwkmemProcessWarning location= jobid= [ 49.617102] mOS-ras: msg="unmap_pagetbl: PMD not present for address:7fffffe03000 size:4k" id=mOSLwkmemProcessWarning location= jobid= [ 49.617163] mOS-ras: msg="unmap_pagetbl: PMD not present for address:7fffffe04000 size:4k" id=mOSLwkmemProcessWarning location= jobid=

run command yod -M 2048m -C all echo hello is work


hello friend, i have a question about the build stage and boot stage.

whether if the build stage was executed at sles 15 sp1 linux environment? Then, built and created kernel-mos-xxx.rpm file would be exported to the centos7 file for installing and boot ?

Thank you for your environment setup stages of mOS of HPC.