Open a-strong-python opened 9 months ago
From your lscpu
output, we can get the following information:
Hyper-threading
is enabled, which can do harm to performance for computation-intensive taskYou can search blogs about numa
for more information.
If you do want to use all of the cores, I guess you may want to tune OMP_NUM_THREADS
config or use numactl
.
export OMP_NUM_THREADS=80
numactl -C 0-79 your_program
If this does not work, can you provide the instructions for reproducing the issue?
Here is the output after I executed the command under jupyter lab:
!numactl --show
policy: default
preferred node: current
physcpubind: 10 11 12 13 14 15 16 17 18 19 50 51 52 53 54 55 56 57 58 59
cpubind: 0
nodebind: 0
membind: 0 1
!numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
node 0 size: 127598 MB
node 0 free: 124906 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
node 1 size: 128960 MB
node 1 free: 127766 MB
node distances:
node 0 1
0: 10 21
1: 21 10
!export OMP_NUM_THREADS=80 !numactl -C 0-79 ./main.ipynb
libnuma: Warning: cpu argument 0-79 is out of range
<0-79> is invalid
usage: numactl [--all | -a] [--interleave= | -i <nodes>] [--preferred= | -p <node>]
[--physcpubind= | -C <cpus>] [--cpunodebind= | -N <nodes>]
[--membind= | -m <nodes>] [--localalloc | -l] command args ...
numactl [--show | -s]
numactl [--hardware | -H]
numactl [--length | -l <length>] [--offset | -o <offset>] [--shmmode | -M <shmmode>]
[--strict | -t]
[--shmid | -I <id>] --shm | -S <shmkeyfile>
[--shmid | -I <id>] --file | -f <tmpfsfile>
[--huge | -u] [--touch | -T]
memory policy | --dump | -d | --dump-nodes | -D
memory policy is --interleave | -i, --preferred | -p, --membind | -m, --localalloc | -l
<nodes> is a comma delimited list of node numbers or A-B ranges or all.
Instead of a number a node can also be:
netdev:DEV the node connected to network device DEV
file:PATH the node the block device of path is connected to
ip:HOST the node of the network device host routes through
block:PATH the node of block device path
pci:[seg:]bus:dev[:func] The node of a PCI device
<cpus> is a comma delimited list of cpu numbers or A-B ranges or all
all ranges can be inverted with !
all numbers and ranges can be made cpuset-relative with +
the old --cpubind argument is deprecated.
use --cpunodebind or --physcpubind instead
<length> can have g (GB), m (MB) or k (KB) suffixes
No matter what I will! numactl -C 0-79./main.ipynb Specifies the number of times an error message is displayed
libnuma: Warning: cpu argument xxx is out of range
Here is the output after I executed the command under jupyter lab:
!numactl --show
policy: default preferred node: current physcpubind: 10 11 12 13 14 15 16 17 18 19 50 51 52 53 54 55 56 57 58 59 cpubind: 0 nodebind: 0 membind: 0 1
This numactl output indicates that some of your NUMA nodes aren't populated with any memory. Memory seems all installed for node 0.
This explains why numactl -C 0-79./main.ipynb
complain core number out of range.
Please check memory installation on that server.
Here is the output after I executed the command under jupyter lab: !numactl --show
policy: default preferred node: current physcpubind: 10 11 12 13 14 15 16 17 18 19 50 51 52 53 54 55 56 57 58 59 cpubind: 0 nodebind: 0 membind: 0 1
This numactl output indicates that some of your NUMA nodes aren't populated with any memory. Memory seems all installed for node 0.
This explains why
numactl -C 0-79./main.ipynb
complain core number out of range.Please check memory installation on that server.
I use intel Developer Cloud for the On the Edge free server resources, Therefore, I can not view the actual physical server memory installation situation, can only through the 'lscpu' and other commands to view, the current known is that the server is equipped with 256G memory, and the cpu 80 cores and 256G memory resources are evenly allocated to 0,1 two nodes. As follows:
!numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
node 0 size: 127598 MB
node 0 free: 124906 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
node 1 size: 128960 MB
node 1 free: 127766 MB
node distances:
node 0 1
0: 10 21
1: 21 10
If necessary, you can also go to the intel platform to quickly reproduce the problem:Developer Cloud for the On the Edge
If necessary, you can also go to the intel platform to quickly reproduce the problem:Developer Cloud for the On the Edge
Oh, you are in jupyter notebook provided by Developer Cloud for the On the Edge.
I tried some commands on a free jupyter notebook from Developer Cloud for the On the Edge. Seems jupyter process is running inside a container or VM.
That means although we can see all cores and memory with lspuc
and numactl --hardware
, we can only use the resource assigned (i.e., numactl --show
or assigned when container/vm created). In your env/processor, it is physcpubind: 10 11 12 13 14 15 16 17 18 19 50 51 52 53 54 55 56 57 58 59
, 20 cores.
Another possible reason is that core binding( i.e., numactl xx) is used when jupyter notebook is launched.
Please contact Developer Cloud supporter for resource related issues.
root@s099-n016:~$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 80 On-line CPU(s) list: 0-79 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz CPU family: 6 Model: 85 Thread(s) per core: 2 Core(s) per socket: 20 Socket(s): 2 Stepping: 4 NUMA:
NUMA node(s): 2 NUMA node0 CPU(s): 0-19,40-59 NUMA node1 CPU(s): 20-39,60-79