open-mpi / hwloc

Hardware locality (hwloc)
https://www.open-mpi.org/projects/hwloc
Other
563 stars 173 forks source link

NVidia Grace L3 cache size seems wrong #689

Open JimCownie opened 1 week ago

JimCownie commented 1 week ago

What version of hwloc are you using?

2.10.0 lstopo says % lstopo --version lstopo 2.9.0

Which operating system and hardware are you running on?

% uname -a Linux nid001040 5.14.21-150500.55.31_13.0.53-cray_shasta_c_64k #1 SMP Mon Dec 4 22:56:47 UTC 2023 (03d3f83) aarch64 aarch64 aarch64 GNU/Linux

% lstopo - Machine (477GB total) Package L#0 NUMANode L#0 (P#0 117GB) L3 L#0 (114MB) L2 L#0 (1024KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 + PU L#0 (P#0) L2 L#1 (1024KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 + PU L#1 (P#1) L2 L#2 (1024KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 + PU L#2 (P#2) L2 L#3 (1024KB) + L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 + PU L#3 (P#3) L2 L#4 (1024KB) + L1d L#4 (64KB) + L1i L#4 (64KB) + Core L#4 + PU L#4 (P#4) L2 L#5 (1024KB) + L1d L#5 (64KB) + L1i L#5 (64KB) + Core L#5 + PU L#5 (P#5) L2 L#6 (1024KB) + L1d L#6 (64KB) + L1i L#6 (64KB) + Core L#6 + PU L#6 (P#6) L2 L#7 (1024KB) + L1d L#7 (64KB) + L1i L#7 (64KB) + Core L#7 + PU L#7 (P#7) L2 L#8 (1024KB) + L1d L#8 (64KB) + L1i L#8 (64KB) + Core L#8 + PU L#8 (P#8) L2 L#9 (1024KB) + L1d L#9 (64KB) + L1i L#9 (64KB) + Core L#9 + PU L#9 (P#9) L2 L#10 (1024KB) + L1d L#10 (64KB) + L1i L#10 (64KB) + Core L#10 + PU L#10 (P#10) L2 L#11 (1024KB) + L1d L#11 (64KB) + L1i L#11 (64KB) + Core L#11 + PU L#11 (P#11) L2 L#12 (1024KB) + L1d L#12 (64KB) + L1i L#12 (64KB) + Core L#12 + PU L#12 (P#12) L2 L#13 (1024KB) + L1d L#13 (64KB) + L1i L#13 (64KB) + Core L#13 + PU L#13 (P#13) L2 L#14 (1024KB) + L1d L#14 (64KB) + L1i L#14 (64KB) + Core L#14 + PU L#14 (P#14) L2 L#15 (1024KB) + L1d L#15 (64KB) + L1i L#15 (64KB) + Core L#15 + PU L#15 (P#15) L2 L#16 (1024KB) + L1d L#16 (64KB) + L1i L#16 (64KB) + Core L#16 + PU L#16 (P#16) L2 L#17 (1024KB) + L1d L#17 (64KB) + L1i L#17 (64KB) + Core L#17 + PU L#17 (P#17) L2 L#18 (1024KB) + L1d L#18 (64KB) + L1i L#18 (64KB) + Core L#18 + PU L#18 (P#18) L2 L#19 (1024KB) + L1d L#19 (64KB) + L1i L#19 (64KB) + Core L#19 + PU L#19 (P#19) L2 L#20 (1024KB) + L1d L#20 (64KB) + L1i L#20 (64KB) + Core L#20 + PU L#20 (P#20) L2 L#21 (1024KB) + L1d L#21 (64KB) + L1i L#21 (64KB) + Core L#21 + PU L#21 (P#21) L2 L#22 (1024KB) + L1d L#22 (64KB) + L1i L#22 (64KB) + Core L#22 + PU L#22 (P#22) L2 L#23 (1024KB) + L1d L#23 (64KB) + L1i L#23 (64KB) + Core L#23 + PU L#23 (P#23) L2 L#24 (1024KB) + L1d L#24 (64KB) + L1i L#24 (64KB) + Core L#24 + PU L#24 (P#24) L2 L#25 (1024KB) + L1d L#25 (64KB) + L1i L#25 (64KB) + Core L#25 + PU L#25 (P#25) L2 L#26 (1024KB) + L1d L#26 (64KB) + L1i L#26 (64KB) + Core L#26 + PU L#26 (P#26) L2 L#27 (1024KB) + L1d L#27 (64KB) + L1i L#27 (64KB) + Core L#27 + PU L#27 (P#27) L2 L#28 (1024KB) + L1d L#28 (64KB) + L1i L#28 (64KB) + Core L#28 + PU L#28 (P#28) L2 L#29 (1024KB) + L1d L#29 (64KB) + L1i L#29 (64KB) + Core L#29 + PU L#29 (P#29) L2 L#30 (1024KB) + L1d L#30 (64KB) + L1i L#30 (64KB) + Core L#30 + PU L#30 (P#30) L2 L#31 (1024KB) + L1d L#31 (64KB) + L1i L#31 (64KB) + Core L#31 + PU L#31 (P#31) L2 L#32 (1024KB) + L1d L#32 (64KB) + L1i L#32 (64KB) + Core L#32 + PU L#32 (P#32) L2 L#33 (1024KB) + L1d L#33 (64KB) + L1i L#33 (64KB) + Core L#33 + PU L#33 (P#33) L2 L#34 (1024KB) + L1d L#34 (64KB) + L1i L#34 (64KB) + Core L#34 + PU L#34 (P#34) L2 L#35 (1024KB) + L1d L#35 (64KB) + L1i L#35 (64KB) + Core L#35 + PU L#35 (P#35) L2 L#36 (1024KB) + L1d L#36 (64KB) + L1i L#36 (64KB) + Core L#36 + PU L#36 (P#36) L2 L#37 (1024KB) + L1d L#37 (64KB) + L1i L#37 (64KB) + Core L#37 + PU L#37 (P#37) L2 L#38 (1024KB) + L1d L#38 (64KB) + L1i L#38 (64KB) + Core L#38 + PU L#38 (P#38) L2 L#39 (1024KB) + L1d L#39 (64KB) + L1i L#39 (64KB) + Core L#39 + PU L#39 (P#39) L2 L#40 (1024KB) + L1d L#40 (64KB) + L1i L#40 (64KB) + Core L#40 + PU L#40 (P#40) L2 L#41 (1024KB) + L1d L#41 (64KB) + L1i L#41 (64KB) + Core L#41 + PU L#41 (P#41) L2 L#42 (1024KB) + L1d L#42 (64KB) + L1i L#42 (64KB) + Core L#42 + PU L#42 (P#42) L2 L#43 (1024KB) + L1d L#43 (64KB) + L1i L#43 (64KB) + Core L#43 + PU L#43 (P#43) L2 L#44 (1024KB) + L1d L#44 (64KB) + L1i L#44 (64KB) + Core L#44 + PU L#44 (P#44) L2 L#45 (1024KB) + L1d L#45 (64KB) + L1i L#45 (64KB) + Core L#45 + PU L#45 (P#45) L2 L#46 (1024KB) + L1d L#46 (64KB) + L1i L#46 (64KB) + Core L#46 + PU L#46 (P#46) L2 L#47 (1024KB) + L1d L#47 (64KB) + L1i L#47 (64KB) + Core L#47 + PU L#47 (P#47) L2 L#48 (1024KB) + L1d L#48 (64KB) + L1i L#48 (64KB) + Core L#48 + PU L#48 (P#48) L2 L#49 (1024KB) + L1d L#49 (64KB) + L1i L#49 (64KB) + Core L#49 + PU L#49 (P#49) L2 L#50 (1024KB) + L1d L#50 (64KB) + L1i L#50 (64KB) + Core L#50 + PU L#50 (P#50) L2 L#51 (1024KB) + L1d L#51 (64KB) + L1i L#51 (64KB) + Core L#51 + PU L#51 (P#51) L2 L#52 (1024KB) + L1d L#52 (64KB) + L1i L#52 (64KB) + Core L#52 + PU L#52 (P#52) L2 L#53 (1024KB) + L1d L#53 (64KB) + L1i L#53 (64KB) + Core L#53 + PU L#53 (P#53) L2 L#54 (1024KB) + L1d L#54 (64KB) + L1i L#54 (64KB) + Core L#54 + PU L#54 (P#54) L2 L#55 (1024KB) + L1d L#55 (64KB) + L1i L#55 (64KB) + Core L#55 + PU L#55 (P#55) L2 L#56 (1024KB) + L1d L#56 (64KB) + L1i L#56 (64KB) + Core L#56 + PU L#56 (P#56) L2 L#57 (1024KB) + L1d L#57 (64KB) + L1i L#57 (64KB) + Core L#57 + PU L#57 (P#57) L2 L#58 (1024KB) + L1d L#58 (64KB) + L1i L#58 (64KB) + Core L#58 + PU L#58 (P#58) L2 L#59 (1024KB) + L1d L#59 (64KB) + L1i L#59 (64KB) + Core L#59 + PU L#59 (P#59) L2 L#60 (1024KB) + L1d L#60 (64KB) + L1i L#60 (64KB) + Core L#60 + PU L#60 (P#60) L2 L#61 (1024KB) + L1d L#61 (64KB) + L1i L#61 (64KB) + Core L#61 + PU L#61 (P#61) L2 L#62 (1024KB) + L1d L#62 (64KB) + L1i L#62 (64KB) + Core L#62 + PU L#62 (P#62) L2 L#63 (1024KB) + L1d L#63 (64KB) + L1i L#63 (64KB) + Core L#63 + PU L#63 (P#63) L2 L#64 (1024KB) + L1d L#64 (64KB) + L1i L#64 (64KB) + Core L#64 + PU L#64 (P#64) L2 L#65 (1024KB) + L1d L#65 (64KB) + L1i L#65 (64KB) + Core L#65 + PU L#65 (P#65) L2 L#66 (1024KB) + L1d L#66 (64KB) + L1i L#66 (64KB) + Core L#66 + PU L#66 (P#66) L2 L#67 (1024KB) + L1d L#67 (64KB) + L1i L#67 (64KB) + Core L#67 + PU L#67 (P#67) L2 L#68 (1024KB) + L1d L#68 (64KB) + L1i L#68 (64KB) + Core L#68 + PU L#68 (P#68) L2 L#69 (1024KB) + L1d L#69 (64KB) + L1i L#69 (64KB) + Core L#69 + PU L#69 (P#69) L2 L#70 (1024KB) + L1d L#70 (64KB) + L1i L#70 (64KB) + Core L#70 + PU L#70 (P#70) L2 L#71 (1024KB) + L1d L#71 (64KB) + L1i L#71 (64KB) + Core L#71 + PU L#71 (P#71) HostBridge PCIBridge PCI 0002:01:00.0 (Ethernet) Net "hsn0" HostBridge PCIBridge PCI 0005:01:00.0 (Ethernet) Net "nmn0" HostBridge PCIBridge PCI 0007:01:00.0 (NVMExp) Block(Disk) "nvme0n1" HostBridge PCIBridge PCI 0009:01:00.0 (3D) Package L#1 NUMANode L#1 (P#1 120GB) L3 L#1 (114MB) L2 L#72 (1024KB) + L1d L#72 (64KB) + L1i L#72 (64KB) + Core L#72 + PU L#72 (P#72) L2 L#73 (1024KB) + L1d L#73 (64KB) + L1i L#73 (64KB) + Core L#73 + PU L#73 (P#73) L2 L#74 (1024KB) + L1d L#74 (64KB) + L1i L#74 (64KB) + Core L#74 + PU L#74 (P#74) L2 L#75 (1024KB) + L1d L#75 (64KB) + L1i L#75 (64KB) + Core L#75 + PU L#75 (P#75) L2 L#76 (1024KB) + L1d L#76 (64KB) + L1i L#76 (64KB) + Core L#76 + PU L#76 (P#76) L2 L#77 (1024KB) + L1d L#77 (64KB) + L1i L#77 (64KB) + Core L#77 + PU L#77 (P#77) L2 L#78 (1024KB) + L1d L#78 (64KB) + L1i L#78 (64KB) + Core L#78 + PU L#78 (P#78) L2 L#79 (1024KB) + L1d L#79 (64KB) + L1i L#79 (64KB) + Core L#79 + PU L#79 (P#79) L2 L#80 (1024KB) + L1d L#80 (64KB) + L1i L#80 (64KB) + Core L#80 + PU L#80 (P#80) L2 L#81 (1024KB) + L1d L#81 (64KB) + L1i L#81 (64KB) + Core L#81 + PU L#81 (P#81) L2 L#82 (1024KB) + L1d L#82 (64KB) + L1i L#82 (64KB) + Core L#82 + PU L#82 (P#82) L2 L#83 (1024KB) + L1d L#83 (64KB) + L1i L#83 (64KB) + Core L#83 + PU L#83 (P#83) L2 L#84 (1024KB) + L1d L#84 (64KB) + L1i L#84 (64KB) + Core L#84 + PU L#84 (P#84) L2 L#85 (1024KB) + L1d L#85 (64KB) + L1i L#85 (64KB) + Core L#85 + PU L#85 (P#85) L2 L#86 (1024KB) + L1d L#86 (64KB) + L1i L#86 (64KB) + Core L#86 + PU L#86 (P#86) L2 L#87 (1024KB) + L1d L#87 (64KB) + L1i L#87 (64KB) + Core L#87 + PU L#87 (P#87) L2 L#88 (1024KB) + L1d L#88 (64KB) + L1i L#88 (64KB) + Core L#88 + PU L#88 (P#88) L2 L#89 (1024KB) + L1d L#89 (64KB) + L1i L#89 (64KB) + Core L#89 + PU L#89 (P#89) L2 L#90 (1024KB) + L1d L#90 (64KB) + L1i L#90 (64KB) + Core L#90 + PU L#90 (P#90) L2 L#91 (1024KB) + L1d L#91 (64KB) + L1i L#91 (64KB) + Core L#91 + PU L#91 (P#91) L2 L#92 (1024KB) + L1d L#92 (64KB) + L1i L#92 (64KB) + Core L#92 + PU L#92 (P#92) L2 L#93 (1024KB) + L1d L#93 (64KB) + L1i L#93 (64KB) + Core L#93 + PU L#93 (P#93) L2 L#94 (1024KB) + L1d L#94 (64KB) + L1i L#94 (64KB) + Core L#94 + PU L#94 (P#94) L2 L#95 (1024KB) + L1d L#95 (64KB) + L1i L#95 (64KB) + Core L#95 + PU L#95 (P#95) L2 L#96 (1024KB) + L1d L#96 (64KB) + L1i L#96 (64KB) + Core L#96 + PU L#96 (P#96) L2 L#97 (1024KB) + L1d L#97 (64KB) + L1i L#97 (64KB) + Core L#97 + PU L#97 (P#97) L2 L#98 (1024KB) + L1d L#98 (64KB) + L1i L#98 (64KB) + Core L#98 + PU L#98 (P#98) L2 L#99 (1024KB) + L1d L#99 (64KB) + L1i L#99 (64KB) + Core L#99 + PU L#99 (P#99) L2 L#100 (1024KB) + L1d L#100 (64KB) + L1i L#100 (64KB) + Core L#100 + PU L#100 (P#100) L2 L#101 (1024KB) + L1d L#101 (64KB) + L1i L#101 (64KB) + Core L#101 + PU L#101 (P#101) L2 L#102 (1024KB) + L1d L#102 (64KB) + L1i L#102 (64KB) + Core L#102 + PU L#102 (P#102) L2 L#103 (1024KB) + L1d L#103 (64KB) + L1i L#103 (64KB) + Core L#103 + PU L#103 (P#103) L2 L#104 (1024KB) + L1d L#104 (64KB) + L1i L#104 (64KB) + Core L#104 + PU L#104 (P#104) L2 L#105 (1024KB) + L1d L#105 (64KB) + L1i L#105 (64KB) + Core L#105 + PU L#105 (P#105) L2 L#106 (1024KB) + L1d L#106 (64KB) + L1i L#106 (64KB) + Core L#106 + PU L#106 (P#106) L2 L#107 (1024KB) + L1d L#107 (64KB) + L1i L#107 (64KB) + Core L#107 + PU L#107 (P#107) L2 L#108 (1024KB) + L1d L#108 (64KB) + L1i L#108 (64KB) + Core L#108 + PU L#108 (P#108) L2 L#109 (1024KB) + L1d L#109 (64KB) + L1i L#109 (64KB) + Core L#109 + PU L#109 (P#109) L2 L#110 (1024KB) + L1d L#110 (64KB) + L1i L#110 (64KB) + Core L#110 + PU L#110 (P#110) L2 L#111 (1024KB) + L1d L#111 (64KB) + L1i L#111 (64KB) + Core L#111 + PU L#111 (P#111) L2 L#112 (1024KB) + L1d L#112 (64KB) + L1i L#112 (64KB) + Core L#112 + PU L#112 (P#112) L2 L#113 (1024KB) + L1d L#113 (64KB) + L1i L#113 (64KB) + Core L#113 + PU L#113 (P#113) L2 L#114 (1024KB) + L1d L#114 (64KB) + L1i L#114 (64KB) + Core L#114 + PU L#114 (P#114) L2 L#115 (1024KB) + L1d L#115 (64KB) + L1i L#115 (64KB) + Core L#115 + PU L#115 (P#115) L2 L#116 (1024KB) + L1d L#116 (64KB) + L1i L#116 (64KB) + Core L#116 + PU L#116 (P#116) L2 L#117 (1024KB) + L1d L#117 (64KB) + L1i L#117 (64KB) + Core L#117 + PU L#117 (P#117) L2 L#118 (1024KB) + L1d L#118 (64KB) + L1i L#118 (64KB) + Core L#118 + PU L#118 (P#118) L2 L#119 (1024KB) + L1d L#119 (64KB) + L1i L#119 (64KB) + Core L#119 + PU L#119 (P#119) L2 L#120 (1024KB) + L1d L#120 (64KB) + L1i L#120 (64KB) + Core L#120 + PU L#120 (P#120) L2 L#121 (1024KB) + L1d L#121 (64KB) + L1i L#121 (64KB) + Core L#121 + PU L#121 (P#121) L2 L#122 (1024KB) + L1d L#122 (64KB) + L1i L#122 (64KB) + Core L#122 + PU L#122 (P#122) L2 L#123 (1024KB) + L1d L#123 (64KB) + L1i L#123 (64KB) + Core L#123 + PU L#123 (P#123) L2 L#124 (1024KB) + L1d L#124 (64KB) + L1i L#124 (64KB) + Core L#124 + PU L#124 (P#124) L2 L#125 (1024KB) + L1d L#125 (64KB) + L1i L#125 (64KB) + Core L#125 + PU L#125 (P#125) L2 L#126 (1024KB) + L1d L#126 (64KB) + L1i L#126 (64KB) + Core L#126 + PU L#126 (P#126) L2 L#127 (1024KB) + L1d L#127 (64KB) + L1i L#127 (64KB) + Core L#127 + PU L#127 (P#127) L2 L#128 (1024KB) + L1d L#128 (64KB) + L1i L#128 (64KB) + Core L#128 + PU L#128 (P#128) L2 L#129 (1024KB) + L1d L#129 (64KB) + L1i L#129 (64KB) + Core L#129 + PU L#129 (P#129) L2 L#130 (1024KB) + L1d L#130 (64KB) + L1i L#130 (64KB) + Core L#130 + PU L#130 (P#130) L2 L#131 (1024KB) + L1d L#131 (64KB) + L1i L#131 (64KB) + Core L#131 + PU L#131 (P#131) L2 L#132 (1024KB) + L1d L#132 (64KB) + L1i L#132 (64KB) + Core L#132 + PU L#132 (P#132) L2 L#133 (1024KB) + L1d L#133 (64KB) + L1i L#133 (64KB) + Core L#133 + PU L#133 (P#133) L2 L#134 (1024KB) + L1d L#134 (64KB) + L1i L#134 (64KB) + Core L#134 + PU L#134 (P#134) L2 L#135 (1024KB) + L1d L#135 (64KB) + L1i L#135 (64KB) + Core L#135 + PU L#135 (P#135) L2 L#136 (1024KB) + L1d L#136 (64KB) + L1i L#136 (64KB) + Core L#136 + PU L#136 (P#136) L2 L#137 (1024KB) + L1d L#137 (64KB) + L1i L#137 (64KB) + Core L#137 + PU L#137 (P#137) L2 L#138 (1024KB) + L1d L#138 (64KB) + L1i L#138 (64KB) + Core L#138 + PU L#138 (P#138) L2 L#139 (1024KB) + L1d L#139 (64KB) + L1i L#139 (64KB) + Core L#139 + PU L#139 (P#139) L2 L#140 (1024KB) + L1d L#140 (64KB) + L1i L#140 (64KB) + Core L#140 + PU L#140 (P#140) L2 L#141 (1024KB) + L1d L#141 (64KB) + L1i L#141 (64KB) + Core L#141 + PU L#141 (P#141) L2 L#142 (1024KB) + L1d L#142 (64KB) + L1i L#142 (64KB) + Core L#142 + PU L#142 (P#142) L2 L#143 (1024KB) + L1d L#143 (64KB) + L1i L#143 (64KB) + Core L#143 + PU L#143 (P#143) HostBridge PCIBridge PCI 0010:01:00.0 (Ethernet) Net "hsn2" HostBridge PCIBridge PCI 0019:01:00.0 (3D) Package L#2 NUMANode L#2 (P#2 120GB) L3 L#2 (114MB) L2 L#144 (1024KB) + L1d L#144 (64KB) + L1i L#144 (64KB) + Core L#144 + PU L#144 (P#144) L2 L#145 (1024KB) + L1d L#145 (64KB) + L1i L#145 (64KB) + Core L#145 + PU L#145 (P#145) L2 L#146 (1024KB) + L1d L#146 (64KB) + L1i L#146 (64KB) + Core L#146 + PU L#146 (P#146) L2 L#147 (1024KB) + L1d L#147 (64KB) + L1i L#147 (64KB) + Core L#147 + PU L#147 (P#147) L2 L#148 (1024KB) + L1d L#148 (64KB) + L1i L#148 (64KB) + Core L#148 + PU L#148 (P#148) L2 L#149 (1024KB) + L1d L#149 (64KB) + L1i L#149 (64KB) + Core L#149 + PU L#149 (P#149) L2 L#150 (1024KB) + L1d L#150 (64KB) + L1i L#150 (64KB) + Core L#150 + PU L#150 (P#150) L2 L#151 (1024KB) + L1d L#151 (64KB) + L1i L#151 (64KB) + Core L#151 + PU L#151 (P#151) L2 L#152 (1024KB) + L1d L#152 (64KB) + L1i L#152 (64KB) + Core L#152 + PU L#152 (P#152) L2 L#153 (1024KB) + L1d L#153 (64KB) + L1i L#153 (64KB) + Core L#153 + PU L#153 (P#153) L2 L#154 (1024KB) + L1d L#154 (64KB) + L1i L#154 (64KB) + Core L#154 + PU L#154 (P#154) L2 L#155 (1024KB) + L1d L#155 (64KB) + L1i L#155 (64KB) + Core L#155 + PU L#155 (P#155) L2 L#156 (1024KB) + L1d L#156 (64KB) + L1i L#156 (64KB) + Core L#156 + PU L#156 (P#156) L2 L#157 (1024KB) + L1d L#157 (64KB) + L1i L#157 (64KB) + Core L#157 + PU L#157 (P#157) L2 L#158 (1024KB) + L1d L#158 (64KB) + L1i L#158 (64KB) + Core L#158 + PU L#158 (P#158) L2 L#159 (1024KB) + L1d L#159 (64KB) + L1i L#159 (64KB) + Core L#159 + PU L#159 (P#159) L2 L#160 (1024KB) + L1d L#160 (64KB) + L1i L#160 (64KB) + Core L#160 + PU L#160 (P#160) L2 L#161 (1024KB) + L1d L#161 (64KB) + L1i L#161 (64KB) + Core L#161 + PU L#161 (P#161) L2 L#162 (1024KB) + L1d L#162 (64KB) + L1i L#162 (64KB) + Core L#162 + PU L#162 (P#162) L2 L#163 (1024KB) + L1d L#163 (64KB) + L1i L#163 (64KB) + Core L#163 + PU L#163 (P#163) L2 L#164 (1024KB) + L1d L#164 (64KB) + L1i L#164 (64KB) + Core L#164 + PU L#164 (P#164) L2 L#165 (1024KB) + L1d L#165 (64KB) + L1i L#165 (64KB) + Core L#165 + PU L#165 (P#165) L2 L#166 (1024KB) + L1d L#166 (64KB) + L1i L#166 (64KB) + Core L#166 + PU L#166 (P#166) L2 L#167 (1024KB) + L1d L#167 (64KB) + L1i L#167 (64KB) + Core L#167 + PU L#167 (P#167) L2 L#168 (1024KB) + L1d L#168 (64KB) + L1i L#168 (64KB) + Core L#168 + PU L#168 (P#168) L2 L#169 (1024KB) + L1d L#169 (64KB) + L1i L#169 (64KB) + Core L#169 + PU L#169 (P#169) L2 L#170 (1024KB) + L1d L#170 (64KB) + L1i L#170 (64KB) + Core L#170 + PU L#170 (P#170) L2 L#171 (1024KB) + L1d L#171 (64KB) + L1i L#171 (64KB) + Core L#171 + PU L#171 (P#171) L2 L#172 (1024KB) + L1d L#172 (64KB) + L1i L#172 (64KB) + Core L#172 + PU L#172 (P#172) L2 L#173 (1024KB) + L1d L#173 (64KB) + L1i L#173 (64KB) + Core L#173 + PU L#173 (P#173) L2 L#174 (1024KB) + L1d L#174 (64KB) + L1i L#174 (64KB) + Core L#174 + PU L#174 (P#174) L2 L#175 (1024KB) + L1d L#175 (64KB) + L1i L#175 (64KB) + Core L#175 + PU L#175 (P#175) L2 L#176 (1024KB) + L1d L#176 (64KB) + L1i L#176 (64KB) + Core L#176 + PU L#176 (P#176) L2 L#177 (1024KB) + L1d L#177 (64KB) + L1i L#177 (64KB) + Core L#177 + PU L#177 (P#177) L2 L#178 (1024KB) + L1d L#178 (64KB) + L1i L#178 (64KB) + Core L#178 + PU L#178 (P#178) L2 L#179 (1024KB) + L1d L#179 (64KB) + L1i L#179 (64KB) + Core L#179 + PU L#179 (P#179) L2 L#180 (1024KB) + L1d L#180 (64KB) + L1i L#180 (64KB) + Core L#180 + PU L#180 (P#180) L2 L#181 (1024KB) + L1d L#181 (64KB) + L1i L#181 (64KB) + Core L#181 + PU L#181 (P#181) L2 L#182 (1024KB) + L1d L#182 (64KB) + L1i L#182 (64KB) + Core L#182 + PU L#182 (P#182) L2 L#183 (1024KB) + L1d L#183 (64KB) + L1i L#183 (64KB) + Core L#183 + PU L#183 (P#183) L2 L#184 (1024KB) + L1d L#184 (64KB) + L1i L#184 (64KB) + Core L#184 + PU L#184 (P#184) L2 L#185 (1024KB) + L1d L#185 (64KB) + L1i L#185 (64KB) + Core L#185 + PU L#185 (P#185) L2 L#186 (1024KB) + L1d L#186 (64KB) + L1i L#186 (64KB) + Core L#186 + PU L#186 (P#186) L2 L#187 (1024KB) + L1d L#187 (64KB) + L1i L#187 (64KB) + Core L#187 + PU L#187 (P#187) L2 L#188 (1024KB) + L1d L#188 (64KB) + L1i L#188 (64KB) + Core L#188 + PU L#188 (P#188) L2 L#189 (1024KB) + L1d L#189 (64KB) + L1i L#189 (64KB) + Core L#189 + PU L#189 (P#189) L2 L#190 (1024KB) + L1d L#190 (64KB) + L1i L#190 (64KB) + Core L#190 + PU L#190 (P#190) L2 L#191 (1024KB) + L1d L#191 (64KB) + L1i L#191 (64KB) + Core L#191 + PU L#191 (P#191) L2 L#192 (1024KB) + L1d L#192 (64KB) + L1i L#192 (64KB) + Core L#192 + PU L#192 (P#192) L2 L#193 (1024KB) + L1d L#193 (64KB) + L1i L#193 (64KB) + Core L#193 + PU L#193 (P#193) L2 L#194 (1024KB) + L1d L#194 (64KB) + L1i L#194 (64KB) + Core L#194 + PU L#194 (P#194) L2 L#195 (1024KB) + L1d L#195 (64KB) + L1i L#195 (64KB) + Core L#195 + PU L#195 (P#195) L2 L#196 (1024KB) + L1d L#196 (64KB) + L1i L#196 (64KB) + Core L#196 + PU L#196 (P#196) L2 L#197 (1024KB) + L1d L#197 (64KB) + L1i L#197 (64KB) + Core L#197 + PU L#197 (P#197) L2 L#198 (1024KB) + L1d L#198 (64KB) + L1i L#198 (64KB) + Core L#198 + PU L#198 (P#198) L2 L#199 (1024KB) + L1d L#199 (64KB) + L1i L#199 (64KB) + Core L#199 + PU L#199 (P#199) L2 L#200 (1024KB) + L1d L#200 (64KB) + L1i L#200 (64KB) + Core L#200 + PU L#200 (P#200) L2 L#201 (1024KB) + L1d L#201 (64KB) + L1i L#201 (64KB) + Core L#201 + PU L#201 (P#201) L2 L#202 (1024KB) + L1d L#202 (64KB) + L1i L#202 (64KB) + Core L#202 + PU L#202 (P#202) L2 L#203 (1024KB) + L1d L#203 (64KB) + L1i L#203 (64KB) + Core L#203 + PU L#203 (P#203) L2 L#204 (1024KB) + L1d L#204 (64KB) + L1i L#204 (64KB) + Core L#204 + PU L#204 (P#204) L2 L#205 (1024KB) + L1d L#205 (64KB) + L1i L#205 (64KB) + Core L#205 + PU L#205 (P#205) L2 L#206 (1024KB) + L1d L#206 (64KB) + L1i L#206 (64KB) + Core L#206 + PU L#206 (P#206) L2 L#207 (1024KB) + L1d L#207 (64KB) + L1i L#207 (64KB) + Core L#207 + PU L#207 (P#207) L2 L#208 (1024KB) + L1d L#208 (64KB) + L1i L#208 (64KB) + Core L#208 + PU L#208 (P#208) L2 L#209 (1024KB) + L1d L#209 (64KB) + L1i L#209 (64KB) + Core L#209 + PU L#209 (P#209) L2 L#210 (1024KB) + L1d L#210 (64KB) + L1i L#210 (64KB) + Core L#210 + PU L#210 (P#210) L2 L#211 (1024KB) + L1d L#211 (64KB) + L1i L#211 (64KB) + Core L#211 + PU L#211 (P#211) L2 L#212 (1024KB) + L1d L#212 (64KB) + L1i L#212 (64KB) + Core L#212 + PU L#212 (P#212) L2 L#213 (1024KB) + L1d L#213 (64KB) + L1i L#213 (64KB) + Core L#213 + PU L#213 (P#213) L2 L#214 (1024KB) + L1d L#214 (64KB) + L1i L#214 (64KB) + Core L#214 + PU L#214 (P#214) L2 L#215 (1024KB) + L1d L#215 (64KB) + L1i L#215 (64KB) + Core L#215 + PU L#215 (P#215) HostBridge PCIBridge PCI 0020:01:00.0 (Ethernet) Net "hsn3" HostBridge PCIBridge PCI 0029:01:00.0 (3D) Package L#3 NUMANode L#3 (P#3 120GB) L3 L#3 (114MB) L2 L#216 (1024KB) + L1d L#216 (64KB) + L1i L#216 (64KB) + Core L#216 + PU L#216 (P#216) L2 L#217 (1024KB) + L1d L#217 (64KB) + L1i L#217 (64KB) + Core L#217 + PU L#217 (P#217) L2 L#218 (1024KB) + L1d L#218 (64KB) + L1i L#218 (64KB) + Core L#218 + PU L#218 (P#218) L2 L#219 (1024KB) + L1d L#219 (64KB) + L1i L#219 (64KB) + Core L#219 + PU L#219 (P#219) L2 L#220 (1024KB) + L1d L#220 (64KB) + L1i L#220 (64KB) + Core L#220 + PU L#220 (P#220) L2 L#221 (1024KB) + L1d L#221 (64KB) + L1i L#221 (64KB) + Core L#221 + PU L#221 (P#221) L2 L#222 (1024KB) + L1d L#222 (64KB) + L1i L#222 (64KB) + Core L#222 + PU L#222 (P#222) L2 L#223 (1024KB) + L1d L#223 (64KB) + L1i L#223 (64KB) + Core L#223 + PU L#223 (P#223) L2 L#224 (1024KB) + L1d L#224 (64KB) + L1i L#224 (64KB) + Core L#224 + PU L#224 (P#224) L2 L#225 (1024KB) + L1d L#225 (64KB) + L1i L#225 (64KB) + Core L#225 + PU L#225 (P#225) L2 L#226 (1024KB) + L1d L#226 (64KB) + L1i L#226 (64KB) + Core L#226 + PU L#226 (P#226) L2 L#227 (1024KB) + L1d L#227 (64KB) + L1i L#227 (64KB) + Core L#227 + PU L#227 (P#227) L2 L#228 (1024KB) + L1d L#228 (64KB) + L1i L#228 (64KB) + Core L#228 + PU L#228 (P#228) L2 L#229 (1024KB) + L1d L#229 (64KB) + L1i L#229 (64KB) + Core L#229 + PU L#229 (P#229) L2 L#230 (1024KB) + L1d L#230 (64KB) + L1i L#230 (64KB) + Core L#230 + PU L#230 (P#230) L2 L#231 (1024KB) + L1d L#231 (64KB) + L1i L#231 (64KB) + Core L#231 + PU L#231 (P#231) L2 L#232 (1024KB) + L1d L#232 (64KB) + L1i L#232 (64KB) + Core L#232 + PU L#232 (P#232) L2 L#233 (1024KB) + L1d L#233 (64KB) + L1i L#233 (64KB) + Core L#233 + PU L#233 (P#233) L2 L#234 (1024KB) + L1d L#234 (64KB) + L1i L#234 (64KB) + Core L#234 + PU L#234 (P#234) L2 L#235 (1024KB) + L1d L#235 (64KB) + L1i L#235 (64KB) + Core L#235 + PU L#235 (P#235) L2 L#236 (1024KB) + L1d L#236 (64KB) + L1i L#236 (64KB) + Core L#236 + PU L#236 (P#236) L2 L#237 (1024KB) + L1d L#237 (64KB) + L1i L#237 (64KB) + Core L#237 + PU L#237 (P#237) L2 L#238 (1024KB) + L1d L#238 (64KB) + L1i L#238 (64KB) + Core L#238 + PU L#238 (P#238) L2 L#239 (1024KB) + L1d L#239 (64KB) + L1i L#239 (64KB) + Core L#239 + PU L#239 (P#239) L2 L#240 (1024KB) + L1d L#240 (64KB) + L1i L#240 (64KB) + Core L#240 + PU L#240 (P#240) L2 L#241 (1024KB) + L1d L#241 (64KB) + L1i L#241 (64KB) + Core L#241 + PU L#241 (P#241) L2 L#242 (1024KB) + L1d L#242 (64KB) + L1i L#242 (64KB) + Core L#242 + PU L#242 (P#242) L2 L#243 (1024KB) + L1d L#243 (64KB) + L1i L#243 (64KB) + Core L#243 + PU L#243 (P#243) L2 L#244 (1024KB) + L1d L#244 (64KB) + L1i L#244 (64KB) + Core L#244 + PU L#244 (P#244) L2 L#245 (1024KB) + L1d L#245 (64KB) + L1i L#245 (64KB) + Core L#245 + PU L#245 (P#245) L2 L#246 (1024KB) + L1d L#246 (64KB) + L1i L#246 (64KB) + Core L#246 + PU L#246 (P#246) L2 L#247 (1024KB) + L1d L#247 (64KB) + L1i L#247 (64KB) + Core L#247 + PU L#247 (P#247) L2 L#248 (1024KB) + L1d L#248 (64KB) + L1i L#248 (64KB) + Core L#248 + PU L#248 (P#248) L2 L#249 (1024KB) + L1d L#249 (64KB) + L1i L#249 (64KB) + Core L#249 + PU L#249 (P#249) L2 L#250 (1024KB) + L1d L#250 (64KB) + L1i L#250 (64KB) + Core L#250 + PU L#250 (P#250) L2 L#251 (1024KB) + L1d L#251 (64KB) + L1i L#251 (64KB) + Core L#251 + PU L#251 (P#251) L2 L#252 (1024KB) + L1d L#252 (64KB) + L1i L#252 (64KB) + Core L#252 + PU L#252 (P#252) L2 L#253 (1024KB) + L1d L#253 (64KB) + L1i L#253 (64KB) + Core L#253 + PU L#253 (P#253) L2 L#254 (1024KB) + L1d L#254 (64KB) + L1i L#254 (64KB) + Core L#254 + PU L#254 (P#254) L2 L#255 (1024KB) + L1d L#255 (64KB) + L1i L#255 (64KB) + Core L#255 + PU L#255 (P#255) L2 L#256 (1024KB) + L1d L#256 (64KB) + L1i L#256 (64KB) + Core L#256 + PU L#256 (P#256) L2 L#257 (1024KB) + L1d L#257 (64KB) + L1i L#257 (64KB) + Core L#257 + PU L#257 (P#257) L2 L#258 (1024KB) + L1d L#258 (64KB) + L1i L#258 (64KB) + Core L#258 + PU L#258 (P#258) L2 L#259 (1024KB) + L1d L#259 (64KB) + L1i L#259 (64KB) + Core L#259 + PU L#259 (P#259) L2 L#260 (1024KB) + L1d L#260 (64KB) + L1i L#260 (64KB) + Core L#260 + PU L#260 (P#260) L2 L#261 (1024KB) + L1d L#261 (64KB) + L1i L#261 (64KB) + Core L#261 + PU L#261 (P#261) L2 L#262 (1024KB) + L1d L#262 (64KB) + L1i L#262 (64KB) + Core L#262 + PU L#262 (P#262) L2 L#263 (1024KB) + L1d L#263 (64KB) + L1i L#263 (64KB) + Core L#263 + PU L#263 (P#263) L2 L#264 (1024KB) + L1d L#264 (64KB) + L1i L#264 (64KB) + Core L#264 + PU L#264 (P#264) L2 L#265 (1024KB) + L1d L#265 (64KB) + L1i L#265 (64KB) + Core L#265 + PU L#265 (P#265) L2 L#266 (1024KB) + L1d L#266 (64KB) + L1i L#266 (64KB) + Core L#266 + PU L#266 (P#266) L2 L#267 (1024KB) + L1d L#267 (64KB) + L1i L#267 (64KB) + Core L#267 + PU L#267 (P#267) L2 L#268 (1024KB) + L1d L#268 (64KB) + L1i L#268 (64KB) + Core L#268 + PU L#268 (P#268) L2 L#269 (1024KB) + L1d L#269 (64KB) + L1i L#269 (64KB) + Core L#269 + PU L#269 (P#269) L2 L#270 (1024KB) + L1d L#270 (64KB) + L1i L#270 (64KB) + Core L#270 + PU L#270 (P#270) L2 L#271 (1024KB) + L1d L#271 (64KB) + L1i L#271 (64KB) + Core L#271 + PU L#271 (P#271) L2 L#272 (1024KB) + L1d L#272 (64KB) + L1i L#272 (64KB) + Core L#272 + PU L#272 (P#272) L2 L#273 (1024KB) + L1d L#273 (64KB) + L1i L#273 (64KB) + Core L#273 + PU L#273 (P#273) L2 L#274 (1024KB) + L1d L#274 (64KB) + L1i L#274 (64KB) + Core L#274 + PU L#274 (P#274) L2 L#275 (1024KB) + L1d L#275 (64KB) + L1i L#275 (64KB) + Core L#275 + PU L#275 (P#275) L2 L#276 (1024KB) + L1d L#276 (64KB) + L1i L#276 (64KB) + Core L#276 + PU L#276 (P#276) L2 L#277 (1024KB) + L1d L#277 (64KB) + L1i L#277 (64KB) + Core L#277 + PU L#277 (P#277) L2 L#278 (1024KB) + L1d L#278 (64KB) + L1i L#278 (64KB) + Core L#278 + PU L#278 (P#278) L2 L#279 (1024KB) + L1d L#279 (64KB) + L1i L#279 (64KB) + Core L#279 + PU L#279 (P#279) L2 L#280 (1024KB) + L1d L#280 (64KB) + L1i L#280 (64KB) + Core L#280 + PU L#280 (P#280) L2 L#281 (1024KB) + L1d L#281 (64KB) + L1i L#281 (64KB) + Core L#281 + PU L#281 (P#281) L2 L#282 (1024KB) + L1d L#282 (64KB) + L1i L#282 (64KB) + Core L#282 + PU L#282 (P#282) L2 L#283 (1024KB) + L1d L#283 (64KB) + L1i L#283 (64KB) + Core L#283 + PU L#283 (P#283) L2 L#284 (1024KB) + L1d L#284 (64KB) + L1i L#284 (64KB) + Core L#284 + PU L#284 (P#284) L2 L#285 (1024KB) + L1d L#285 (64KB) + L1i L#285 (64KB) + Core L#285 + PU L#285 (P#285) L2 L#286 (1024KB) + L1d L#286 (64KB) + L1i L#286 (64KB) + Core L#286 + PU L#286 (P#286) L2 L#287 (1024KB) + L1d L#287 (64KB) + L1i L#287 (64KB) + Core L#287 + PU L#287 (P#287) HostBridge PCIBridge PCI 0030:01:00.0 (Ethernet) Net "hsn1" HostBridge PCIBridge PCI 0039:01:00.0 (3D) %

Additional information

If you need me to run experiments on the machine, as you may not have access to one, please let me know. That should be possible.

bgoglin commented 1 week ago

Hello Jim. I would guess it's a 1024 vs 1000 (114x1024 is close to 117x1000), somebody didn't read the specs carefully when filling hardware tables. That's what's exposed in the firmware and then by Linux, the bug wouldn't be on our side. It's somehow like people claiming they have 192GB of RAM while the BIOS actually only gives 185GB to the OS, we usually just ignore the issue.

What bothers me more is how this L3 is shared. The wording in the doc makes me wonder whether it's actually 234 per dual-package superchip instead of 117 per package. Somebody once reported that their experiment results indeed point to shared per dual-package. B we couldn't get a precise answer by NVIDIA. However, this would have to be fixed in their firmware as well (or worked around in some kernel driver).

JimCownie commented 6 days ago

Brice,

Thanks for the rapid reply.

Hello Jim. I would d guess it's a 1024 vs 1000 (1141024 is close to 1171000). That's what's exposed in the firmware and then by Linux, the bug wouldn't be on our side. It's somehow like people claiming they have 192GB of RAM while the BIOS actually only gives 185GB to the OS, we usually just ignore the issue.

I did consider that this could be a MB vs MiB difference, but since the ratio is 1.0241.024 ~= 1.049, 117Mb == 111.6 MiB, so it did't seem like it. The 117MiB also seemed more likely since that can at least be split semi-reasonably by 72 (giving 1.625 (== 13/8) MiB/cache-slice), whereas 114 (23*19) doesn’t manage that, ending up with 1.633333… (== 19/12MiB). Of course, it may be that the number of cache-slices enabled is not the same as the number of cores enabled (since there are 80 available cache slices, and 76 cores to allow for manufacturing defects, so you might be able to enable a few more cache-slices than cores).

NV also seem to be fairly consistently using MiB (even if they label them as MB) on the diagram, at least for the other caches.

I can easily believe that the BIOS is wrong!

I may be able to run a micro-benchmark to see what shows up, though I’m not sure I have exactly that one written :-) (I’ll let you know if I do find anything out).

What bothers me more is how this L3 is shared. The wording in the doc makes me wonder whether it's actually 234 per dual-package superchip instead of 117 per package. Somebody once reported that their experiment results indeed point to shared per dual-package. B we couldn't get a precise answer by NVIDIA. However, this would have to be fixed in their firmware as well (or worked around in some kernel driver).

I’d expect that you have the topology right. Given their diagram, the L3$ is clearly on die, so I’d expect not to be going off chip, which woudl significantly increase the latency (I see a ~5x slowdown to go to an L1 or L2 cache on another die).

So, I agree there’s probably nothing you can do at the hwloc level. It seemed worth mentioning it, though!

On 11 Sep 2024, at 13:09, Brice Goglin @.***> wrote:

Hello Jim. I would d guess it's a 1024 vs 1000 (1141024 is close to 1171000). That's what's exposed in the firmware and then by Linux, the bug wouldn't be on our side. It's somehow like people claiming they have 192GB of RAM while the BIOS actually only gives 185GB to the OS, we usually just ignore the issue.

What bothers me more is how this L3 is shared. The wording in the doc makes me wonder whether it's actually 234 per dual-package superchip instead of 117 per package. Somebody once reported that their experiment results indeed point to shared per dual-package. B we couldn't get a precise answer by NVIDIA. However, this would have to be fixed in their firmware as well (or worked around in some kernel driver).

— Reply to this email directly, view it on GitHub https://github.com/open-mpi/hwloc/issues/689#issuecomment-2343493535, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALY7TVFO2TE4LJDV64Q6SSDZWAXJFAVCNFSM6AAAAABOAWHLA6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBTGQ4TGNJTGU. You are receiving this because you authored the thread.

-- Jim James Cownie @.***> Mob: +44 780 637 7146