open-power-host-os / linux

Linux kernel source tree
Other
3 stars 4 forks source link

Can not allocate hugepages after some iterations of the test #26

Closed nasastry closed 5 years ago

nasastry commented 6 years ago
Mirrored with LTC bug https://bugzilla.linux.ibm.com/show_bug.cgi?id=163537 Can't get hugepages allocated when the test case does the following, For each iteration of the test case: echo n > /proc/sys/vm/nr_hugepages Start the guest backed with 1G hugepages Memory hotplug with 1G echo 0 > /proc/sys/vm/nr_hugepages After running 5 iterations of the above can't allocate any more hugepages. \# free -h total used free shared buff/cache available Mem: 31G 24G 3.2G 37M 3.8G 3.3G Swap: 15G 836M 15G Kernel version: 4.14.0-3.git68b4afb.el7.centos.ppc64le From /etc/os-release: NAME="CentOS Linux" VERSION="7 (AltArch)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (AltArch)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/" SIG_FAMILY="AltArch ppc64le" CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7" \# cat /proc/buddyinfo Node 0, zone DMA 52 156 342 190 98 26 16 18 888 Node 8, zone DMA 11 88 73 42 16 2 44 25 407 [root@zzfp365-lp1 ~]# free -h total used free shared buff/cache available Mem: 31G 7.5G 21G 27M 3.1G 21G Swap: 15G 0B 15G \# numactl -H available: 2 nodes (0,8) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 node 0 size: 16202 MB node 0 free: 14767 MB node 8 cpus: 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 node 8 size: 16238 MB node 8 free: 6819 MB node distances: node 0 8 0: 10 40 8: 40 10 \# collectl -sb --verbose -oT waiting for 1 second sample... \# MEMORY FRAGMENTATION SUMMARY (64K pages) \#Time 1Pg 2Pgs 4Pgs 8Pgs 16Pgs 32Pgs 64Pgs 128Pgs 256Pgs 512Pgs 1024Pgs 18:19:45 174 149 127 156 101 93 73 43 1677 0 0 18:19:46 170 149 128 156 101 93 73 43 1677 0 0 18:19:47 165 148 128 156 101 93 73 43 1677 0 0 18:19:48 166 149 127 156 101 93 73 43 1677 0 0 18:19:49 166 149 127 156 101 93 73 43 1677 0 0 18:19:50 162 149 127 156 101 93 73 43 1677 0 0 18:19:51 159 149 127 156 101 93 73 43 1677 0 0 18:19:52 156 149 127 156 101 93 73 43 1677 0 0
cdeadmin commented 5 years ago

------- Comment From seg@us.ibm.com 2019-04-30 12:32:46 EDT------- So... after sitting on this for 3.5 months with no update, you've reassigned without comment a bug against a defunct product to the kernel team? That doesn't seem like the best possible handling. Anyway, rejecting, as HostOS is dead. Sastry, if you are still interested in pursuing this, I suggest retrying against, say, RHEL 8.

------- Comment From seg@us.ibm.com 2019-04-30 12:35:04 EDT------- Correction 15.5 months

cdeadmin commented 5 years ago

------- Comment From nevdull@us.ibm.com 2019-04-30 13:44:11 EDT------- FWIW, my 30-second hypothesis is that the memory got too fragmented to collect 1GB chunks. Forming huge pages after boot is always subject to memory availability - contiguous memory, remember.