sgl-project / sglang

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
Apache License 2.0
2.75k stars 177 forks source link

missing 1 required positional argument: 'page_size' when using --enable-flashinfer #565

Closed keepitsane closed 4 days ago

keepitsane commented 5 days ago

I am able to get it running properly when not using flashinfer and am currently running 4x NVIDIA A10G. Please let me know what other information might be helpful from my end. python -m sglang.launch_server --model-path /mnt/ebs_volume/models/llama/llama-3-8b-instruct --tp-size=4 --mem-fraction-static=0.75 --port 30000 --enable-flashinfer

sglang==0.1.17 triton==2.3.0 transformers==4.41.2 torch==2.3.0 vllm==0.4.3 vllm-flash-attn==2.5.8.post2 flashinfer==0.0.5 ( I also tested 0.0.6) nvcc==12.1, V12.1.105

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[gpu_id=0] Set cuda device.
[gpu_id=1] Set cuda device.
[gpu_id=2] Set cuda device.
[gpu_id=3] Set cuda device.
[gpu_id=0] Init nccl begin.
[gpu_id=1] Init nccl begin.
[gpu_id=2] Init nccl begin.
[gpu_id=3] Init nccl begin.
WARNING 06-25 19:03:47 custom_all_reduce.py:158] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
WARNING 06-25 19:03:48 custom_all_reduce.py:158] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
WARNING 06-25 19:03:48 custom_all_reduce.py:158] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
WARNING 06-25 19:03:48 custom_all_reduce.py:158] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
[gpu_id=2] Load weight begin. avail mem=21.52 GB
[gpu_id=0] Load weight begin. avail mem=21.52 GB
[gpu_id=1] Load weight begin. avail mem=21.52 GB
[gpu_id=3] Load weight begin. avail mem=21.52 GB
 61%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                                                                                 | 179/292 [00:01<00:00, 130.83it/s][gpu_id=3] Load weight end. type=LlamaForCausalLM, avail mem=17.76 GB
 66%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                                                        | 193/292 [00:01<00:01, 74.92it/s][gpu_id=2] Load weight end. type=LlamaForCausalLM, avail mem=17.76 GB
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎| 291/292 [00:02<00:00, 135.25it/s]
[gpu_id=0] Load weight end. type=LlamaForCausalLM, avail mem=17.76 GB
[gpu_id=1] Load weight end. type=LlamaForCausalLM, avail mem=17.76 GB
[gpu_id=2] Memory pool end. avail mem=4.94 GB
[gpu_id=0] Memory pool end. avail mem=4.94 GB
[gpu_id=1] Memory pool end. avail mem=4.94 GB
[gpu_id=3] Memory pool end. avail mem=4.94 GB
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[gpu_id=3] max_total_num_tokens=405784, max_prefill_tokens=65536, context_len=8192,
[gpu_id=2] max_total_num_tokens=405784, max_prefill_tokens=65536, context_len=8192,
[gpu_id=0] max_total_num_tokens=405784, max_prefill_tokens=65536, context_len=8192,
[gpu_id=0] server_args: enable_flashinfer=True, attention_reduce_in_fp32=False, disable_radix_cache=False, disable_regex_jump_forward=False, disable_disk_cache=False,
[gpu_id=1] max_total_num_tokens=405784, max_prefill_tokens=65536, context_len=8192,
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO:     Started server process [23287]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:30000 (Press CTRL+C to quit)
INFO:     127.0.0.1:58096 - "GET /get_model_info HTTP/1.1" 200 OK
[gpu_id=0] Prefil batch. #new-seq: 1, #new-token: 7, #cached-token: 0, cache hit rate: 0.00%, #running-req: 0, #queue-req: 0
Exception in ModelTpServer:
Traceback (most recent call last):
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 188, in exposed_step
    self.forward_step()
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 204, in forward_step
    self.forward_fill_batch(new_batch)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 443, in forward_fill_batch
    ) = self.model_runner.forward(batch, ForwardMode.EXTEND)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 426, in forward
    return self.forward_extend(batch)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 361, in forward_extend
    input_metadata = InputMetadata.create(
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 209, in create
    ret.init_flashinfer_args(tp_size)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 115, in init_flashinfer_args
    self.prefill_wrapper.begin_forward(*args)
TypeError: BatchPrefillWithPagedKVCacheWrapper.begin_forward() missing 1 required positional argument: 'page_size'

...

Exception in ControllerSingle:
Traceback (most recent call last):
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/manager_single.py", line 93, in start_controller_process
    loop.run_until_complete(controller.loop_for_forward())
  File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/manager_single.py", line 44, in loop_for_forward
    out_pyobjs = await self.model_client.step(next_step_input)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 787, in _func
    return obtain(tasks[0].value)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/rpyc/core/async_.py", line 111, in value
    raise self._obj
_get_exception_class.<locals>.Derived: BatchPrefillWithPagedKVCacheWrapper.begin_forward() missing 1 required positional argument: 'page_size'

========= Remote Traceback (1) =========
Traceback (most recent call last):
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/rpyc/core/protocol.py", line 369, in _dispatch_request
    res = self._HANDLERS[handler](self, *args)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/rpyc/core/protocol.py", line 863, in _handle_call
    return obj(*args, **dict(kwargs))
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 188, in exposed_step
    self.forward_step()
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 204, in forward_step
    self.forward_fill_batch(new_batch)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 443, in forward_fill_batch
    ) = self.model_runner.forward(batch, ForwardMode.EXTEND)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 426, in forward
    return self.forward_extend(batch)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 361, in forward_extend
    input_metadata = InputMetadata.create(
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 209, in create
    ret.init_flashinfer_args(tp_size)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 115, in init_flashinfer_args
    self.prefill_wrapper.begin_forward(*args)
TypeError: BatchPrefillWithPagedKVCacheWrapper.begin_forward() missing 1 required positional argument: 'page_size'

./start-llama-3.sh: line 1: 23287 Killed                  python -m sglang.launch_server --model-path /mnt/ebs_volume/models/llama/llama-3-8b-instruct --tp-size=4 --mem-fraction-static=0.75 --port 30000 --enable-flashinfer
keepitsane commented 5 days ago

Here is a more detailed log dump, could this also be related to #531 ?

❯ ./start-llama-3.sh
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[gpu_id=1] Set cuda device.
[gpu_id=3] Set cuda device.
[gpu_id=0] Set cuda device.
[gpu_id=2] Set cuda device.
[gpu_id=0] Init nccl begin.
DEBUG 06-25 19:22:41 parallel_state.py:87] world_size=4 rank=0 local_rank=0 distributed_init_method=tcp://127.0.0.1:30014 backend=nccl
[gpu_id=3] Init nccl begin.
DEBUG 06-25 19:22:41 parallel_state.py:87] world_size=4 rank=3 local_rank=3 distributed_init_method=tcp://127.0.0.1:30014 backend=nccl
[gpu_id=1] Init nccl begin.
DEBUG 06-25 19:22:41 parallel_state.py:87] world_size=4 rank=1 local_rank=1 distributed_init_method=tcp://127.0.0.1:30014 backend=nccl
[gpu_id=2] Init nccl begin.
DEBUG 06-25 19:22:41 parallel_state.py:87] world_size=4 rank=2 local_rank=2 distributed_init_method=tcp://127.0.0.1:30014 backend=nccl
ip-172-16-1-166:32493:32556 [0] NCCL INFO Bootstrap : Using eth0:172.16.1.166<0>
ip-172-16-1-166:32493:32556 [0] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v8 symbol.
ip-172-16-1-166:32493:32556 [0] NCCL INFO NET/Plugin: Loaded net plugin AWS Libfabric (v6)
ip-172-16-1-166:32493:32556 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v8 symbol.
ip-172-16-1-166:32493:32556 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (>= v5). ncclCollNetPlugin symbols v4 and lower are not supported.
ip-172-16-1-166:32493:32556 [0] NCCL INFO cudaDriverVersion 12020
NCCL version 2.20.5+cuda12.4
ip-172-16-1-166:32496:32559 [3] NCCL INFO cudaDriverVersion 12020
ip-172-16-1-166:32496:32559 [3] NCCL INFO Bootstrap : Using eth0:172.16.1.166<0>
ip-172-16-1-166:32496:32559 [3] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v8 symbol.
ip-172-16-1-166:32496:32559 [3] NCCL INFO NET/Plugin: Loaded net plugin AWS Libfabric (v6)
ip-172-16-1-166:32496:32559 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v8 symbol.
ip-172-16-1-166:32496:32559 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (>= v5). ncclCollNetPlugin symbols v4 and lower are not supported.
ip-172-16-1-166:32495:32558 [2] NCCL INFO cudaDriverVersion 12020
ip-172-16-1-166:32495:32558 [2] NCCL INFO Bootstrap : Using eth0:172.16.1.166<0>
ip-172-16-1-166:32494:32557 [1] NCCL INFO cudaDriverVersion 12020
ip-172-16-1-166:32494:32557 [1] NCCL INFO Bootstrap : Using eth0:172.16.1.166<0>
ip-172-16-1-166:32495:32558 [2] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v8 symbol.
ip-172-16-1-166:32495:32558 [2] NCCL INFO NET/Plugin: Loaded net plugin AWS Libfabric (v6)
ip-172-16-1-166:32495:32558 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v8 symbol.
ip-172-16-1-166:32495:32558 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (>= v5). ncclCollNetPlugin symbols v4 and lower are not supported.
ip-172-16-1-166:32494:32557 [1] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v8 symbol.
ip-172-16-1-166:32494:32557 [1] NCCL INFO NET/Plugin: Loaded net plugin AWS Libfabric (v6)
ip-172-16-1-166:32494:32557 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v8 symbol.
ip-172-16-1-166:32494:32557 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (>= v5). ncclCollNetPlugin symbols v4 and lower are not supported.
ip-172-16-1-166:32493:32644 [0] NCCL INFO NET/OFI Initializing aws-ofi-nccl 1.7.3-aws
ip-172-16-1-166:32493:32644 [0] NCCL INFO NET/OFI Using CUDA runtime version 12010
ip-172-16-1-166:32493:32644 [0] NCCL INFO NET/OFI Configuring AWS-specific options
ip-172-16-1-166:32493:32644 [0] 159.415628 get_platform_type:117 NCCL TRACE NET/OFI EC2 platform type is g5.12xlarge
ip-172-16-1-166:32493:32644 [0] NCCL INFO NET/OFI Setting provider_filter to efa
ip-172-16-1-166:32493:32644 [0] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1

ip-172-16-1-166:32493:32644 [0] configure_nvls_option:287 NCCL WARN NET/OFI Could not find ncclGetVersion symbol
ip-172-16-1-166:32493:32644 [0] NCCL INFO NET/OFI Disabling NVLS support due to NCCL version 0
ip-172-16-1-166:32493:32644 [0] NCCL INFO NET/OFI Internode latency set at 150.0 us
ip-172-16-1-166:32496:32645 [3] NCCL INFO NET/OFI Initializing aws-ofi-nccl 1.7.3-aws
ip-172-16-1-166:32496:32645 [3] NCCL INFO NET/OFI Using CUDA runtime version 12010
ip-172-16-1-166:32496:32645 [3] NCCL INFO NET/OFI Configuring AWS-specific options
ip-172-16-1-166:32496:32645 [3] 155.422315 get_platform_type:117 NCCL TRACE NET/OFI EC2 platform type is g5.12xlarge
ip-172-16-1-166:32496:32645 [3] NCCL INFO NET/OFI Setting provider_filter to efa
ip-172-16-1-166:32496:32645 [3] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1

ip-172-16-1-166:32496:32645 [3] configure_nvls_option:287 NCCL WARN NET/OFI Could not find ncclGetVersion symbol
ip-172-16-1-166:32496:32645 [3] NCCL INFO NET/OFI Disabling NVLS support due to NCCL version 0
ip-172-16-1-166:32496:32645 [3] NCCL INFO NET/OFI Internode latency set at 150.0 us
ip-172-16-1-166:32495:32646 [2] NCCL INFO NET/OFI Initializing aws-ofi-nccl 1.7.3-aws
ip-172-16-1-166:32495:32646 [2] NCCL INFO NET/OFI Using CUDA runtime version 12010
ip-172-16-1-166:32495:32646 [2] NCCL INFO NET/OFI Configuring AWS-specific options
ip-172-16-1-166:32495:32646 [2] 158.809657 get_platform_type:117 NCCL TRACE NET/OFI EC2 platform type is g5.12xlarge
ip-172-16-1-166:32495:32646 [2] NCCL INFO NET/OFI Setting provider_filter to efa
ip-172-16-1-166:32495:32646 [2] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1

ip-172-16-1-166:32495:32646 [2] configure_nvls_option:287 NCCL WARN NET/OFI Could not find ncclGetVersion symbol
ip-172-16-1-166:32495:32646 [2] NCCL INFO NET/OFI Disabling NVLS support due to NCCL version 0
ip-172-16-1-166:32495:32646 [2] NCCL INFO NET/OFI Internode latency set at 150.0 us
ip-172-16-1-166:32494:32647 [1] NCCL INFO NET/OFI Initializing aws-ofi-nccl 1.7.3-aws
ip-172-16-1-166:32494:32647 [1] NCCL INFO NET/OFI Using CUDA runtime version 12010
ip-172-16-1-166:32494:32647 [1] NCCL INFO NET/OFI Configuring AWS-specific options
ip-172-16-1-166:32494:32647 [1] 160.018489 get_platform_type:117 NCCL TRACE NET/OFI EC2 platform type is g5.12xlarge
ip-172-16-1-166:32494:32647 [1] NCCL INFO NET/OFI Setting provider_filter to efa
ip-172-16-1-166:32494:32647 [1] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1

ip-172-16-1-166:32494:32647 [1] configure_nvls_option:287 NCCL WARN NET/OFI Could not find ncclGetVersion symbol
ip-172-16-1-166:32494:32647 [1] NCCL INFO NET/OFI Disabling NVLS support due to NCCL version 0
ip-172-16-1-166:32494:32647 [1] NCCL INFO NET/OFI Internode latency set at 150.0 us
ip-172-16-1-166:32493:32644 [0] 171.542188 find_ofi_provider:589 NCCL TRACE NET/OFI Could not find any optimal provider supporting GPUDirect RDMA
ip-172-16-1-166:32493:32644 [0] 171.947925 find_ofi_provider:601 NCCL TRACE NET/OFI Using Libfabric 1.18 API, without GPUDirect RDMA support

ip-172-16-1-166:32493:32644 [0] nccl_net_ofi_init:1237 NCCL WARN NET/OFI aws-ofi-nccl initialization failed
ip-172-16-1-166:32493:32644 [0] NCCL INFO net.cc:111 -> 2
ip-172-16-1-166:32493:32644 [0] NCCL INFO NET/IB : No device found.
ip-172-16-1-166:32493:32644 [0] NCCL INFO NET/Socket : Using [0]eth0:172.16.1.166<0>
ip-172-16-1-166:32493:32644 [0] NCCL INFO Using non-device net plugin version 0
ip-172-16-1-166:32493:32644 [0] NCCL INFO Using network Socket
ip-172-16-1-166:32496:32645 [3] 165.530858 find_ofi_provider:589 NCCL TRACE NET/OFI Could not find any optimal provider supporting GPUDirect RDMA
ip-172-16-1-166:32496:32645 [3] 165.922145 find_ofi_provider:601 NCCL TRACE NET/OFI Using Libfabric 1.18 API, without GPUDirect RDMA support

ip-172-16-1-166:32496:32645 [3] nccl_net_ofi_init:1237 NCCL WARN NET/OFI aws-ofi-nccl initialization failed
ip-172-16-1-166:32496:32645 [3] NCCL INFO net.cc:111 -> 2
ip-172-16-1-166:32496:32645 [3] NCCL INFO NET/IB : No device found.
ip-172-16-1-166:32496:32645 [3] NCCL INFO NET/Socket : Using [0]eth0:172.16.1.166<0>
ip-172-16-1-166:32496:32645 [3] NCCL INFO Using non-device net plugin version 0
ip-172-16-1-166:32496:32645 [3] NCCL INFO Using network Socket
ip-172-16-1-166:32495:32646 [2] 168.800378 find_ofi_provider:589 NCCL TRACE NET/OFI Could not find any optimal provider supporting GPUDirect RDMA
ip-172-16-1-166:32495:32646 [2] 169.205495 find_ofi_provider:601 NCCL TRACE NET/OFI Using Libfabric 1.18 API, without GPUDirect RDMA support

ip-172-16-1-166:32495:32646 [2] nccl_net_ofi_init:1237 NCCL WARN NET/OFI aws-ofi-nccl initialization failed
ip-172-16-1-166:32495:32646 [2] NCCL INFO net.cc:111 -> 2
ip-172-16-1-166:32495:32646 [2] NCCL INFO NET/IB : No device found.
ip-172-16-1-166:32495:32646 [2] NCCL INFO NET/Socket : Using [0]eth0:172.16.1.166<0>
ip-172-16-1-166:32495:32646 [2] NCCL INFO Using non-device net plugin version 0
ip-172-16-1-166:32495:32646 [2] NCCL INFO Using network Socket
ip-172-16-1-166:32494:32647 [1] 170.125622 find_ofi_provider:589 NCCL TRACE NET/OFI Could not find any optimal provider supporting GPUDirect RDMA
ip-172-16-1-166:32494:32647 [1] 170.529169 find_ofi_provider:601 NCCL TRACE NET/OFI Using Libfabric 1.18 API, without GPUDirect RDMA support

ip-172-16-1-166:32494:32647 [1] nccl_net_ofi_init:1237 NCCL WARN NET/OFI aws-ofi-nccl initialization failed
ip-172-16-1-166:32494:32647 [1] NCCL INFO net.cc:111 -> 2
ip-172-16-1-166:32494:32647 [1] NCCL INFO NET/IB : No device found.
ip-172-16-1-166:32494:32647 [1] NCCL INFO NET/Socket : Using [0]eth0:172.16.1.166<0>
ip-172-16-1-166:32494:32647 [1] NCCL INFO Using non-device net plugin version 0
ip-172-16-1-166:32494:32647 [1] NCCL INFO Using network Socket
ip-172-16-1-166:32495:32646 [2] NCCL INFO comm 0x7f10cd1979d0 rank 2 nranks 4 cudaDev 2 nvmlDev 2 busId 1d0 commId 0x5758ad46544f0c59 - Init START
ip-172-16-1-166:32496:32645 [3] NCCL INFO comm 0x7f10cd197960 rank 3 nranks 4 cudaDev 3 nvmlDev 3 busId 1e0 commId 0x5758ad46544f0c59 - Init START
ip-172-16-1-166:32494:32647 [1] NCCL INFO comm 0x7f10cd197ab0 rank 1 nranks 4 cudaDev 1 nvmlDev 1 busId 1c0 commId 0x5758ad46544f0c59 - Init START
ip-172-16-1-166:32493:32644 [0] NCCL INFO comm 0x7f10c51996b0 rank 0 nranks 4 cudaDev 0 nvmlDev 0 busId 1b0 commId 0x5758ad46544f0c59 - Init START
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
ip-172-16-1-166:32496:32645 [3] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
ip-172-16-1-166:32495:32646 [2] NCCL INFO comm 0x7f10cd1979d0 rank 2 nRanks 4 nNodes 1 localRanks 4 localRank 2 MNNVL 0
ip-172-16-1-166:32494:32647 [1] NCCL INFO comm 0x7f10cd197ab0 rank 1 nRanks 4 nNodes 1 localRanks 4 localRank 1 MNNVL 0
ip-172-16-1-166:32496:32645 [3] NCCL INFO comm 0x7f10cd197960 rank 3 nRanks 4 nNodes 1 localRanks 4 localRank 3 MNNVL 0
ip-172-16-1-166:32493:32644 [0] NCCL INFO comm 0x7f10c51996b0 rank 0 nRanks 4 nNodes 1 localRanks 4 localRank 0 MNNVL 0
ip-172-16-1-166:32495:32646 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1
ip-172-16-1-166:32494:32647 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0
ip-172-16-1-166:32496:32645 [3] NCCL INFO Trees [0] -1/-1/-1->3->2 [1] -1/-1/-1->3->2 [2] -1/-1/-1->3->2 [3] -1/-1/-1->3->2
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P Chunksize set to 131072
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P Chunksize set to 131072
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P Chunksize set to 131072
ip-172-16-1-166:32493:32644 [0] NCCL INFO Channel 00/04 :    0   1   2   3
ip-172-16-1-166:32493:32644 [0] NCCL INFO Channel 01/04 :    0   1   2   3
ip-172-16-1-166:32493:32644 [0] NCCL INFO Channel 02/04 :    0   1   2   3
ip-172-16-1-166:32493:32644 [0] NCCL INFO Channel 03/04 :    0   1   2   3
ip-172-16-1-166:32493:32644 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P Chunksize set to 131072
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO Channel 00 : 0[0] -> 1[1] via SHM/direct/direct
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO Channel 00 : 1[1] -> 2[2] via SHM/direct/direct
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO Channel 00 : 3[3] -> 0[0] via SHM/direct/direct
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO Channel 00 : 2[2] -> 3[3] via SHM/direct/direct
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO Channel 01 : 0[0] -> 1[1] via SHM/direct/direct
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO Channel 01 : 1[1] -> 2[2] via SHM/direct/direct
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO Channel 01 : 3[3] -> 0[0] via SHM/direct/direct
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO Channel 01 : 2[2] -> 3[3] via SHM/direct/direct
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO Channel 02 : 0[0] -> 1[1] via SHM/direct/direct
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO Channel 02 : 1[1] -> 2[2] via SHM/direct/direct
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO Channel 02 : 3[3] -> 0[0] via SHM/direct/direct
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO Channel 02 : 2[2] -> 3[3] via SHM/direct/direct
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO Channel 03 : 0[0] -> 1[1] via SHM/direct/direct
ip-172-16-1-166:32494:32647 [1] NCCL INFO Channel 03 : 1[1] -> 2[2] via SHM/direct/direct
ip-172-16-1-166:32496:32645 [3] NCCL INFO Channel 03 : 3[3] -> 0[0] via SHM/direct/direct
ip-172-16-1-166:32495:32646 [2] NCCL INFO Channel 03 : 2[2] -> 3[3] via SHM/direct/direct
ip-172-16-1-166:32495:32646 [2] NCCL INFO Connected all rings
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO Connected all rings
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO Connected all rings
ip-172-16-1-166:32493:32644 [0] NCCL INFO Connected all rings
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO Channel 00 : 3[3] -> 2[2] via SHM/direct/direct
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO Channel 01 : 3[3] -> 2[2] via SHM/direct/direct
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO Channel 02 : 3[3] -> 2[2] via SHM/direct/direct
ip-172-16-1-166:32496:32645 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32645 [3] NCCL INFO Channel 03 : 3[3] -> 2[2] via SHM/direct/direct
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32644 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO Channel 00 : 2[2] -> 1[1] via SHM/direct/direct
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO Channel 01 : 2[2] -> 1[1] via SHM/direct/direct
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO Channel 02 : 2[2] -> 1[1] via SHM/direct/direct
ip-172-16-1-166:32495:32646 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32646 [2] NCCL INFO Channel 03 : 2[2] -> 1[1] via SHM/direct/direct
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO Channel 00 : 1[1] -> 0[0] via SHM/direct/direct
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO Channel 01 : 1[1] -> 0[0] via SHM/direct/direct
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO Channel 02 : 1[1] -> 0[0] via SHM/direct/direct
ip-172-16-1-166:32494:32647 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32647 [1] NCCL INFO Channel 03 : 1[1] -> 0[0] via SHM/direct/direct
ip-172-16-1-166:32496:32645 [3] NCCL INFO Connected all trees
ip-172-16-1-166:32496:32645 [3] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
ip-172-16-1-166:32496:32645 [3] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
ip-172-16-1-166:32493:32644 [0] NCCL INFO Connected all trees
ip-172-16-1-166:32494:32647 [1] NCCL INFO Connected all trees
ip-172-16-1-166:32495:32646 [2] NCCL INFO Connected all trees
ip-172-16-1-166:32493:32644 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
ip-172-16-1-166:32493:32644 [0] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
ip-172-16-1-166:32494:32647 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
ip-172-16-1-166:32494:32647 [1] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
ip-172-16-1-166:32495:32646 [2] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
ip-172-16-1-166:32495:32646 [2] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
ip-172-16-1-166:32493:32644 [0] NCCL INFO comm 0x7f10c51996b0 rank 0 nranks 4 cudaDev 0 nvmlDev 0 busId 1b0 commId 0x5758ad46544f0c59 - Init COMPLETE
ip-172-16-1-166:32495:32646 [2] NCCL INFO comm 0x7f10cd1979d0 rank 2 nranks 4 cudaDev 2 nvmlDev 2 busId 1d0 commId 0x5758ad46544f0c59 - Init COMPLETE
ip-172-16-1-166:32494:32647 [1] NCCL INFO comm 0x7f10cd197ab0 rank 1 nranks 4 cudaDev 1 nvmlDev 1 busId 1c0 commId 0x5758ad46544f0c59 - Init COMPLETE
ip-172-16-1-166:32496:32645 [3] NCCL INFO comm 0x7f10cd197960 rank 3 nranks 4 cudaDev 3 nvmlDev 3 busId 1e0 commId 0x5758ad46544f0c59 - Init COMPLETE
ip-172-16-1-166:32493:32556 [0] NCCL INFO Using non-device net plugin version 0
ip-172-16-1-166:32495:32558 [2] NCCL INFO Using non-device net plugin version 0
ip-172-16-1-166:32493:32556 [0] NCCL INFO Using network Socket
ip-172-16-1-166:32495:32558 [2] NCCL INFO Using network Socket
ip-172-16-1-166:32494:32557 [1] NCCL INFO Using non-device net plugin version 0
ip-172-16-1-166:32494:32557 [1] NCCL INFO Using network Socket
ip-172-16-1-166:32496:32559 [3] NCCL INFO Using non-device net plugin version 0
ip-172-16-1-166:32496:32559 [3] NCCL INFO Using network Socket
ip-172-16-1-166:32494:32557 [1] NCCL INFO comm 0x7f10cd30d7e0 rank 1 nranks 4 cudaDev 1 nvmlDev 1 busId 1c0 commId 0x180347d61164acdd - Init START
ip-172-16-1-166:32493:32556 [0] NCCL INFO comm 0x7f10c530f0c0 rank 0 nranks 4 cudaDev 0 nvmlDev 0 busId 1b0 commId 0x180347d61164acdd - Init START
ip-172-16-1-166:32495:32558 [2] NCCL INFO comm 0x7f10cd30d8c0 rank 2 nranks 4 cudaDev 2 nvmlDev 2 busId 1d0 commId 0x180347d61164acdd - Init START
ip-172-16-1-166:32496:32559 [3] NCCL INFO comm 0x7f10cd30d6f0 rank 3 nranks 4 cudaDev 3 nvmlDev 3 busId 1e0 commId 0x180347d61164acdd - Init START
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO comm 0x7f10cd30d7e0 rank 1 nRanks 4 nNodes 1 localRanks 4 localRank 1 MNNVL 0
ip-172-16-1-166:32493:32556 [0] NCCL INFO comm 0x7f10c530f0c0 rank 0 nRanks 4 nNodes 1 localRanks 4 localRank 0 MNNVL 0
ip-172-16-1-166:32495:32558 [2] NCCL INFO comm 0x7f10cd30d8c0 rank 2 nRanks 4 nNodes 1 localRanks 4 localRank 2 MNNVL 0
ip-172-16-1-166:32496:32559 [3] NCCL INFO comm 0x7f10cd30d6f0 rank 3 nRanks 4 nNodes 1 localRanks 4 localRank 3 MNNVL 0
ip-172-16-1-166:32493:32556 [0] NCCL INFO Channel 00/04 :    0   1   2   3
ip-172-16-1-166:32494:32557 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0
ip-172-16-1-166:32493:32556 [0] NCCL INFO Channel 01/04 :    0   1   2   3
ip-172-16-1-166:32495:32558 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1
ip-172-16-1-166:32496:32559 [3] NCCL INFO Trees [0] -1/-1/-1->3->2 [1] -1/-1/-1->3->2 [2] -1/-1/-1->3->2 [3] -1/-1/-1->3->2
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P Chunksize set to 131072
ip-172-16-1-166:32493:32556 [0] NCCL INFO Channel 02/04 :    0   1   2   3
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P Chunksize set to 131072
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P Chunksize set to 131072
ip-172-16-1-166:32493:32556 [0] NCCL INFO Channel 03/04 :    0   1   2   3
ip-172-16-1-166:32493:32556 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P Chunksize set to 131072
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO Channel 00 : 1[1] -> 2[2] via SHM/direct/direct
ip-172-16-1-166:32493:32556 [0] NCCL INFO Channel 00 : 0[0] -> 1[1] via SHM/direct/direct
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO Channel 01 : 0[0] -> 1[1] via SHM/direct/direct
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO Channel 01 : 1[1] -> 2[2] via SHM/direct/direct
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO Channel 02 : 0[0] -> 1[1] via SHM/direct/direct
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO Channel 02 : 1[1] -> 2[2] via SHM/direct/direct
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO Channel 03 : 0[0] -> 1[1] via SHM/direct/direct
ip-172-16-1-166:32494:32557 [1] NCCL INFO Channel 03 : 1[1] -> 2[2] via SHM/direct/direct
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO Channel 00 : 3[3] -> 0[0] via SHM/direct/direct
ip-172-16-1-166:32495:32558 [2] NCCL INFO Channel 00 : 2[2] -> 3[3] via SHM/direct/direct
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO Channel 01 : 3[3] -> 0[0] via SHM/direct/direct
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO Channel 01 : 2[2] -> 3[3] via SHM/direct/direct
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO Channel 02 : 3[3] -> 0[0] via SHM/direct/direct
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO Channel 02 : 2[2] -> 3[3] via SHM/direct/direct
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO Channel 03 : 3[3] -> 0[0] via SHM/direct/direct
ip-172-16-1-166:32495:32558 [2] NCCL INFO Channel 03 : 2[2] -> 3[3] via SHM/direct/direct
ip-172-16-1-166:32493:32556 [0] NCCL INFO Connected all rings
ip-172-16-1-166:32494:32557 [1] NCCL INFO Connected all rings
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO Connected all rings
ip-172-16-1-166:32496:32559 [3] NCCL INFO Connected all rings
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO Channel 00 : 3[3] -> 2[2] via SHM/direct/direct
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO Channel 01 : 3[3] -> 2[2] via SHM/direct/direct
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO Channel 02 : 3[3] -> 2[2] via SHM/direct/direct
ip-172-16-1-166:32496:32559 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32496:32559 [3] NCCL INFO Channel 03 : 3[3] -> 2[2] via SHM/direct/direct
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32493:32556 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO Channel 00 : 2[2] -> 1[1] via SHM/direct/direct
ip-172-16-1-166:32494:32557 [1] NCCL INFO Channel 00 : 1[1] -> 0[0] via SHM/direct/direct
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO Channel 01 : 2[2] -> 1[1] via SHM/direct/direct
ip-172-16-1-166:32494:32557 [1] NCCL INFO Channel 01 : 1[1] -> 0[0] via SHM/direct/direct
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO Channel 02 : 2[2] -> 1[1] via SHM/direct/direct
ip-172-16-1-166:32495:32558 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32494:32557 [1] NCCL INFO Channel 02 : 1[1] -> 0[0] via SHM/direct/direct
ip-172-16-1-166:32494:32557 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
ip-172-16-1-166:32495:32558 [2] NCCL INFO Channel 03 : 2[2] -> 1[1] via SHM/direct/direct
ip-172-16-1-166:32494:32557 [1] NCCL INFO Channel 03 : 1[1] -> 0[0] via SHM/direct/direct
ip-172-16-1-166:32493:32556 [0] NCCL INFO Connected all trees
ip-172-16-1-166:32496:32559 [3] NCCL INFO Connected all trees
ip-172-16-1-166:32496:32559 [3] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
ip-172-16-1-166:32493:32556 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
ip-172-16-1-166:32496:32559 [3] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
ip-172-16-1-166:32494:32557 [1] NCCL INFO Connected all trees
ip-172-16-1-166:32493:32556 [0] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
ip-172-16-1-166:32494:32557 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
ip-172-16-1-166:32495:32558 [2] NCCL INFO Connected all trees
ip-172-16-1-166:32494:32557 [1] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
ip-172-16-1-166:32495:32558 [2] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
ip-172-16-1-166:32495:32558 [2] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
ip-172-16-1-166:32493:32556 [0] NCCL INFO comm 0x7f10c530f0c0 rank 0 nranks 4 cudaDev 0 nvmlDev 0 busId 1b0 commId 0x180347d61164acdd - Init COMPLETE
ip-172-16-1-166:32495:32558 [2] NCCL INFO comm 0x7f10cd30d8c0 rank 2 nranks 4 cudaDev 2 nvmlDev 2 busId 1d0 commId 0x180347d61164acdd - Init COMPLETE
ip-172-16-1-166:32496:32559 [3] NCCL INFO comm 0x7f10cd30d6f0 rank 3 nranks 4 cudaDev 3 nvmlDev 3 busId 1e0 commId 0x180347d61164acdd - Init COMPLETE
ip-172-16-1-166:32494:32557 [1] NCCL INFO comm 0x7f10cd30d7e0 rank 1 nranks 4 cudaDev 1 nvmlDev 1 busId 1c0 commId 0x180347d61164acdd - Init COMPLETE
WARNING 06-25 19:22:58 custom_all_reduce.py:158] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
WARNING 06-25 19:22:58 custom_all_reduce.py:158] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
WARNING 06-25 19:22:58 custom_all_reduce.py:158] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
WARNING 06-25 19:22:58 custom_all_reduce.py:158] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
[gpu_id=0] Load weight begin. avail mem=21.52 GB
[gpu_id=1] Load weight begin. avail mem=21.03 GB
[gpu_id=2] Load weight begin. avail mem=21.03 GB
[gpu_id=3] Load weight begin. avail mem=21.03 GB
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎| 291/292 [00:01<00:00, 185.27it/s]
[gpu_id=3] Load weight end. type=LlamaForCausalLM, avail mem=17.27 GB
[gpu_id=0] Load weight end. type=LlamaForCausalLM, avail mem=17.76 GB
[gpu_id=1] Load weight end. type=LlamaForCausalLM, avail mem=17.27 GB
[gpu_id=2] Load weight end. type=LlamaForCausalLM, avail mem=17.27 GB
[gpu_id=1] Memory pool end. avail mem=4.83 GB
[gpu_id=3] Memory pool end. avail mem=4.83 GB
[gpu_id=0] Memory pool end. avail mem=5.33 GB
[gpu_id=2] Memory pool end. avail mem=4.83 GB
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[gpu_id=0] max_total_num_tokens=393626, max_prefill_tokens=65536, context_len=8192,
[gpu_id=0] server_args: enable_flashinfer=True, attention_reduce_in_fp32=False, disable_radix_cache=False, disable_regex_jump_forward=False, disable_disk_cache=False,
[gpu_id=3] max_total_num_tokens=393626, max_prefill_tokens=65536, context_len=8192,
[gpu_id=1] max_total_num_tokens=393626, max_prefill_tokens=65536, context_len=8192,
[gpu_id=2] max_total_num_tokens=393626, max_prefill_tokens=65536, context_len=8192,
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO:     Started server process [32441]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:30000 (Press CTRL+C to quit)
INFO:     127.0.0.1:60980 - "GET /get_model_info HTTP/1.1" 200 OK
[gpu_id=0] Prefil batch. #new-seq: 1, #new-token: 7, #cached-token: 0, cache hit rate: 0.00%, #running-req: 0, #queue-req: 0
Exception in ModelTpServer:
Traceback (most recent call last):
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 188, in exposed_step
    self.forward_step()
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 204, in forward_step
    self.forward_fill_batch(new_batch)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 443, in forward_fill_batch
    ) = self.model_runner.forward(batch, ForwardMode.EXTEND)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 426, in forward
    return self.forward_extend(batch)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 361, in forward_extend
    input_metadata = InputMetadata.create(
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 209, in create
    ret.init_flashinfer_args(tp_size)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 115, in init_flashinfer_args
    self.prefill_wrapper.begin_forward(*args)
TypeError: BatchPrefillWithPagedKVCacheWrapper.begin_forward() missing 1 required positional argument: 'page_size'

Exception in ModelTpServer:
Traceback (most recent call last):
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 188, in exposed_step
    self.forward_step()
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 204, in forward_step
    self.forward_fill_batch(new_batch)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 443, in forward_fill_batch
    ) = self.model_runner.forward(batch, ForwardMode.EXTEND)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 426, in forward
    return self.forward_extend(batch)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 361, in forward_extend
    input_metadata = InputMetadata.create(
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 209, in create
    ret.init_flashinfer_args(tp_size)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 115, in init_flashinfer_args
    self.prefill_wrapper.begin_forward(*args)
TypeError: BatchPrefillWithPagedKVCacheWrapper.begin_forward() missing 1 required positional argument: 'page_size'

Exception in ModelTpServer:
Traceback (most recent call last):
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 188, in exposed_step
    self.forward_step()
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 204, in forward_step
    self.forward_fill_batch(new_batch)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 443, in forward_fill_batch
    ) = self.model_runner.forward(batch, ForwardMode.EXTEND)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 426, in forward
    return self.forward_extend(batch)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 361, in forward_extend
    input_metadata = InputMetadata.create(
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 209, in create
    ret.init_flashinfer_args(tp_size)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 115, in init_flashinfer_args
    self.prefill_wrapper.begin_forward(*args)
TypeError: BatchPrefillWithPagedKVCacheWrapper.begin_forward() missing 1 required positional argument: 'page_size'

Exception in ModelTpServer:
Traceback (most recent call last):
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 188, in exposed_step
    self.forward_step()
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 204, in forward_step
    self.forward_fill_batch(new_batch)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 443, in forward_fill_batch
    ) = self.model_runner.forward(batch, ForwardMode.EXTEND)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 426, in forward
    return self.forward_extend(batch)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 361, in forward_extend
    input_metadata = InputMetadata.create(
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 209, in create
    ret.init_flashinfer_args(tp_size)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 115, in init_flashinfer_args
    self.prefill_wrapper.begin_forward(*args)
TypeError: BatchPrefillWithPagedKVCacheWrapper.begin_forward() missing 1 required positional argument: 'page_size'

Exception in ControllerSingle:
Traceback (most recent call last):
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/manager_single.py", line 93, in start_controller_process
    loop.run_until_complete(controller.loop_for_forward())
  File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/manager_single.py", line 44, in loop_for_forward
    out_pyobjs = await self.model_client.step(next_step_input)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 787, in _func
    return obtain(tasks[0].value)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/rpyc/core/async_.py", line 111, in value
    raise self._obj
_get_exception_class.<locals>.Derived: BatchPrefillWithPagedKVCacheWrapper.begin_forward() missing 1 required positional argument: 'page_size'

========= Remote Traceback (1) =========
Traceback (most recent call last):
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/rpyc/core/protocol.py", line 369, in _dispatch_request
    res = self._HANDLERS[handler](self, *args)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/rpyc/core/protocol.py", line 863, in _handle_call
    return obj(*args, **dict(kwargs))
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 188, in exposed_step
    self.forward_step()
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 204, in forward_step
    self.forward_fill_batch(new_batch)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/tp_worker.py", line 443, in forward_fill_batch
    ) = self.model_runner.forward(batch, ForwardMode.EXTEND)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 426, in forward
    return self.forward_extend(batch)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 361, in forward_extend
    input_metadata = InputMetadata.create(
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 209, in create
    ret.init_flashinfer_args(tp_size)
  File "/home/ec2-user/miniconda3/envs/pytorch_p310/lib/python3.10/site-packages/sglang/srt/managers/controller/model_runner.py", line 115, in init_flashinfer_args
    self.prefill_wrapper.begin_forward(*args)
TypeError: BatchPrefillWithPagedKVCacheWrapper.begin_forward() missing 1 required positional argument: 'page_size'

./start-llama-3.sh: line 1: 32441 Killed                  python -m sglang.launch_server --model-path /mnt/ebs_volume/models/llama/llama-3-8b-instruct --tp-size=4 --mem-fraction-static=0.75 --port 30000 --enable-flashinfer
lss15151161 commented 5 days ago

I also met this promble, do you solve it?

hnyls2002 commented 4 days ago

@lss15151161 @keepitsane Have you tried the latest main branch?

keepitsane commented 4 days ago

I thought I did, but I was accidentally set to a different branch that was slightly behind 😄

Pulled from main branch and that has solved my issues. Thanks for the support!