intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.63k stars 1.26k forks source link

My intel-cpu server has a total of 80 cpu cores, but the program in jupyter can only run 20 cpu cores, how can I run all the cpu cores? #10184

Open a-strong-python opened 8 months ago

a-strong-python commented 8 months ago

root@s099-n016:~$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 80 On-line CPU(s) list: 0-79 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz CPU family: 6 Model: 85 Thread(s) per core: 2 Core(s) per socket: 20 Socket(s): 2 Stepping: 4 NUMA:
NUMA node(s): 2 NUMA node0 CPU(s): 0-19,40-59 NUMA node1 CPU(s): 20-39,60-79

gc-fu commented 8 months ago

From your lscpu output, we can get the following information:

  1. Hyper-threading is enabled, which can do harm to performance for computation-intensive task
  2. One socket has 20 physical cores, so currently the best practice is to use 20 cores in one socket.

You can search blogs about numa for more information.

If you do want to use all of the cores, I guess you may want to tune OMP_NUM_THREADS config or use numactl.

export OMP_NUM_THREADS=80
numactl -C 0-79 your_program

If this does not work, can you provide the instructions for reproducing the issue?

a-strong-python commented 8 months ago

Here is the output after I executed the command under jupyter lab:

!numactl --show

policy: default
preferred node: current
physcpubind: 10 11 12 13 14 15 16 17 18 19 50 51 52 53 54 55 56 57 58 59 
cpubind: 0 
nodebind: 0 
membind: 0 1 

!numactl --hardware

available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
node 0 size: 127598 MB
node 0 free: 124906 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
node 1 size: 128960 MB
node 1 free: 127766 MB
node distances:
node   0   1 
  0:  10  21 
  1:  21  10 

!export OMP_NUM_THREADS=80 !numactl -C 0-79 ./main.ipynb

libnuma: Warning: cpu argument 0-79 is out of range

<0-79> is invalid
usage: numactl [--all | -a] [--interleave= | -i <nodes>] [--preferred= | -p <node>]
               [--physcpubind= | -C <cpus>] [--cpunodebind= | -N <nodes>]
               [--membind= | -m <nodes>] [--localalloc | -l] command args ...
       numactl [--show | -s]
       numactl [--hardware | -H]
       numactl [--length | -l <length>] [--offset | -o <offset>] [--shmmode | -M <shmmode>]
               [--strict | -t]
               [--shmid | -I <id>] --shm | -S <shmkeyfile>
               [--shmid | -I <id>] --file | -f <tmpfsfile>
               [--huge | -u] [--touch | -T] 
               memory policy | --dump | -d | --dump-nodes | -D

memory policy is --interleave | -i, --preferred | -p, --membind | -m, --localalloc | -l
<nodes> is a comma delimited list of node numbers or A-B ranges or all.
Instead of a number a node can also be:
  netdev:DEV the node connected to network device DEV
  file:PATH  the node the block device of path is connected to
  ip:HOST    the node of the network device host routes through
  block:PATH the node of block device path
  pci:[seg:]bus:dev[:func] The node of a PCI device
<cpus> is a comma delimited list of cpu numbers or A-B ranges or all
all ranges can be inverted with !
all numbers and ranges can be made cpuset-relative with +
the old --cpubind argument is deprecated.
use --cpunodebind or --physcpubind instead
<length> can have g (GB), m (MB) or k (KB) suffixes
a-strong-python commented 8 months ago

No matter what I will! numactl -C 0-79./main.ipynb Specifies the number of times an error message is displayed

libnuma: Warning: cpu argument xxx is out of range

is invalid This is very confusing to me!
qiyuangong commented 8 months ago

Here is the output after I executed the command under jupyter lab:

!numactl --show

policy: default
preferred node: current
physcpubind: 10 11 12 13 14 15 16 17 18 19 50 51 52 53 54 55 56 57 58 59 
cpubind: 0 
nodebind: 0 
membind: 0 1 

This numactl output indicates that some of your NUMA nodes aren't populated with any memory. Memory seems all installed for node 0.

This explains why numactl -C 0-79./main.ipynb complain core number out of range.

Please check memory installation on that server.

a-strong-python commented 8 months ago

Here is the output after I executed the command under jupyter lab: !numactl --show

policy: default
preferred node: current
physcpubind: 10 11 12 13 14 15 16 17 18 19 50 51 52 53 54 55 56 57 58 59 
cpubind: 0 
nodebind: 0 
membind: 0 1 

This numactl output indicates that some of your NUMA nodes aren't populated with any memory. Memory seems all installed for node 0.

This explains why numactl -C 0-79./main.ipynb complain core number out of range.

Please check memory installation on that server.

I use intel Developer Cloud for the On the Edge free server resources, Therefore, I can not view the actual physical server memory installation situation, can only through the 'lscpu' and other commands to view, the current known is that the server is equipped with 256G memory, and the cpu 80 cores and 256G memory resources are evenly allocated to 0,1 two nodes. As follows:

!numactl --hardware

available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
node 0 size: 127598 MB
node 0 free: 124906 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
node 1 size: 128960 MB
node 1 free: 127766 MB
node distances:
node   0   1 
  0:  10  21 
  1:  21  10 
a-strong-python commented 8 months ago

If necessary, you can also go to the intel platform to quickly reproduce the problem:Developer Cloud for the On the Edge

qiyuangong commented 8 months ago

If necessary, you can also go to the intel platform to quickly reproduce the problem:Developer Cloud for the On the Edge

Oh, you are in jupyter notebook provided by Developer Cloud for the On the Edge.

I tried some commands on a free jupyter notebook from Developer Cloud for the On the Edge. Seems jupyter process is running inside a container or VM.

That means although we can see all cores and memory with lspuc and numactl --hardware, we can only use the resource assigned (i.e., numactl --show or assigned when container/vm created). In your env/processor, it is physcpubind: 10 11 12 13 14 15 16 17 18 19 50 51 52 53 54 55 56 57 58 59, 20 cores.

Another possible reason is that core binding( i.e., numactl xx) is used when jupyter notebook is launched.

Please contact Developer Cloud supporter for resource related issues.