### Expected behavior
I expect tritonserver to start up successfully.
### actual behavior
tritonserver stops printing logs and the processes just run indefinitely. I'm unable to reach tritonserver via grpc or http so I do not believe it is running at all. The output I get is
root@keith-a100-dev4:/home/tensorrtllm_backend# I0515 21:22:16.209472 7394 pinned_memory_manager.cc:275] Pinned memory pool is created at '0x75e75e000000' with size 268435456
I0515 21:22:16.214944 7394 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0515 21:22:16.214957 7394 cuda_memory_manager.cc:107] CUDA memory pool is created on device 1 with size 67108864
I0515 21:22:16.497971 7394 model_lifecycle.cc:469] loading: smaug34b:1
[TensorRT-LLM][WARNING] gpu_device_ids is not specified, will be automatically set
[TensorRT-LLM][WARNING] max_beam_width is not specified, will use default value of 1
[TensorRT-LLM][WARNING] iter_stats_max_iterations is not specified, will use default value of 1000
[TensorRT-LLM][WARNING] request_stats_max_iterations is not specified, will use default value of 0
[TensorRT-LLM][WARNING] normalize_log_probs is not specified, will be set to true
[TensorRT-LLM][WARNING] max_tokens_in_paged_kv_cache is not specified, will use default value
[TensorRT-LLM][WARNING] kv_cache_free_gpu_mem_fraction is not specified, will use default value of 0.9 or max_tokens_in_paged_kv_cache
[TensorRT-LLM][WARNING] kv_cache_host_memory_bytes not set, defaulting to 0
[TensorRT-LLM][WARNING] kv_cache_onboard_blocks not set, defaulting to true
[TensorRT-LLM][WARNING] max_attention_window_size is not specified, will use default value (i.e. max_sequence_length)
[TensorRT-LLM][WARNING] sink_token_length is not specified, will use default value
[TensorRT-LLM][WARNING] enable_kv_cache_reuse is not specified, will be set to false
[TensorRT-LLM][WARNING] enable_chunked_context is not specified, will be set to false.
[TensorRT-LLM][WARNING] lora_cache_max_adapter_size not set, defaulting to 64
[TensorRT-LLM][WARNING] lora_cache_optimal_adapter_size not set, defaulting to 8
[TensorRT-LLM][WARNING] lora_cache_gpu_memory_fraction not set, defaulting to 0.05
[TensorRT-LLM][WARNING] lora_cache_host_memory_bytes not set, defaulting to 1GB
[TensorRT-LLM][WARNING] decoding_mode parameter is invalid or not specified(must be one of the {top_k, top_p, top_k_top_p, beam_search}).Using default: top_k_top_p if max_beam_width == 1, beam_search otherwise
[TensorRT-LLM][WARNING] medusa_choices parameter is not specified. Will be using default mc_sim_7b_63 choices instead
[TensorRT-LLM][INFO] Engine version 0.11.0.dev2024051400 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found
[TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set.
[TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json:
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found
[TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found
[TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set.
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][WARNING] gpu_device_ids is not specified, will be automatically set
[TensorRT-LLM][WARNING] max_beam_width is not specified, will use default value of 1
[TensorRT-LLM][WARNING] iter_stats_max_iterations is not specified, will use default value of 1000
[TensorRT-LLM][WARNING] request_stats_max_iterations is not specified, will use default value of 0
[TensorRT-LLM][WARNING] normalize_log_probs is not specified, will be set to true
[TensorRT-LLM][WARNING] max_tokens_in_paged_kv_cache is not specified, will use default value
[TensorRT-LLM][WARNING] kv_cache_free_gpu_mem_fraction is not specified, will use default value of 0.9 or max_tokens_in_paged_kv_cache
[TensorRT-LLM][WARNING] kv_cache_host_memory_bytes not set, defaulting to 0
[TensorRT-LLM][WARNING] kv_cache_onboard_blocks not set, defaulting to true
[TensorRT-LLM][WARNING] max_attention_window_size is not specified, will use default value (i.e. max_sequence_length)
[TensorRT-LLM][WARNING] sink_token_length is not specified, will use default value
[TensorRT-LLM][WARNING] enable_kv_cache_reuse is not specified, will be set to false
[TensorRT-LLM][WARNING] enable_chunked_context is not specified, will be set to false.
[TensorRT-LLM][WARNING] lora_cache_max_adapter_size not set, defaulting to 64
[TensorRT-LLM][WARNING] lora_cache_optimal_adapter_size not set, defaulting to 8
[TensorRT-LLM][WARNING] lora_cache_gpu_memory_fraction not set, defaulting to 0.05
[TensorRT-LLM][WARNING] lora_cache_host_memory_bytes not set, defaulting to 1GB
[TensorRT-LLM][WARNING] decoding_mode parameter is invalid or not specified(must be one of the {top_k, top_p, top_k_top_p, beam_search}).Using default: top_k_top_p if max_beam_width == 1, beam_search otherwise
[TensorRT-LLM][WARNING] medusa_choices parameter is not specified. Will be using default mc_sim_7b_63 choices instead
[TensorRT-LLM][INFO] Engine version 0.11.0.dev2024051400 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found
[TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set.
[TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json:
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found
[TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found
[TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set.
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] MPI size: 2, rank: 0
[TensorRT-LLM][INFO] MPI size: 2, rank: 1
When the running `summarize.py` script I get the following output
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024051400
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024051400
[05/15/2024-21:14:21] [TRT-LLM] [I] Load tokenizer takes: 0.08764123916625977 sec
[05/15/2024-21:14:21] [TRT-LLM] [I] Load tokenizer takes: 0.08883118629455566 sec
[TensorRT-LLM][INFO] Engine version 0.11.0.dev2024051400 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found
[TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set.
[TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json:
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found
[TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found
[TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set.
[TensorRT-LLM][INFO] Engine version 0.11.0.dev2024051400 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found
[TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set.
[TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json:
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found
[TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found
[TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set.
[TensorRT-LLM][INFO] MPI size: 2, rank: 0
[TensorRT-LLM][INFO] Engine version 0.11.0.dev2024051400 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found
[TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set.
[TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json:
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found
[TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found
[TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set.
[TensorRT-LLM][INFO] Engine version 0.11.0.dev2024051400 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found
[TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set.
[TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json:
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found
[TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found
[TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set.
[TensorRT-LLM][INFO] Engine version 0.11.0.dev2024051400 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found
[TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set.
[TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json:
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found
[TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found
[TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set.
[TensorRT-LLM][INFO] MPI size: 2, rank: 1
[TensorRT-LLM][INFO] Engine version 0.11.0.dev2024051400 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found
[TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set.
[TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json:
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null
[TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found
[TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set.
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found
[TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set.
[TensorRT-LLM][INFO] MPI size: 2, rank: 1
[TensorRT-LLM][INFO] MPI size: 2, rank: 0
[TensorRT-LLM][INFO] Rank 0 is using GPU 0
[TensorRT-LLM][INFO] TRTGptModel maxNumSequences: 32
[TensorRT-LLM][INFO] TRTGptModel maxBatchSize: 32
[TensorRT-LLM][INFO] TRTGptModel mMaxAttentionWindowSize: 2048
[TensorRT-LLM][INFO] TRTGptModel enableTrtOverlap: 0
[TensorRT-LLM][INFO] TRTGptModel normalizeLogProbs: 1
[TensorRT-LLM][INFO] Loaded engine size: 33338 MiB
[TensorRT-LLM][INFO] Rank 1 is using GPU 1
[TensorRT-LLM][INFO] TRTGptModel maxNumSequences: 32
[TensorRT-LLM][INFO] TRTGptModel maxBatchSize: 32
[TensorRT-LLM][INFO] TRTGptModel mMaxAttentionWindowSize: 2048
[TensorRT-LLM][INFO] TRTGptModel enableTrtOverlap: 0
[TensorRT-LLM][INFO] TRTGptModel normalizeLogProbs: 1
[TensorRT-LLM][INFO] Loaded engine size: 33338 MiB
keith-a100-dev4:7179:7179 [0] NCCL INFO Bootstrap : Using eth0:10.5.0.15<0>
keith-a100-dev4:7179:7179 [0] NCCL INFO cudaDriverVersion 12040
NCCL version 2.19.3+cuda12.3
keith-a100-dev4:7179:7179 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so
keith-a100-dev4:7179:7179 [0] NCCL INFO P2P plugin IBext_v7
keith-a100-dev4:7179:7179 [0] NCCL INFO NET/IB : No device found.
keith-a100-dev4:7179:7179 [0] NCCL INFO NET/IB : No device found.
keith-a100-dev4:7179:7179 [0] NCCL INFO NET/Socket : Using [0]eth0:10.5.0.15<0> [1]enP4801s1:fe80::7e1e:52ff:fe22:721d%enP4801s1<0>
keith-a100-dev4:7179:7179 [0] NCCL INFO Using non-device net plugin version 0
keith-a100-dev4:7179:7179 [0] NCCL INFO Using network Socket
keith-a100-dev4:7180:7180 [1] NCCL INFO cudaDriverVersion 12040
keith-a100-dev4:7180:7180 [1] NCCL INFO Bootstrap : Using eth0:10.5.0.15<0>
keith-a100-dev4:7180:7180 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so
keith-a100-dev4:7180:7180 [1] NCCL INFO P2P plugin IBext_v7
keith-a100-dev4:7180:7180 [1] NCCL INFO NET/IB : No device found.
keith-a100-dev4:7180:7180 [1] NCCL INFO NET/IB : No device found.
keith-a100-dev4:7180:7180 [1] NCCL INFO NET/Socket : Using [0]eth0:10.5.0.15<0> [1]enP4801s1:fe80::7e1e:52ff:fe22:721d%enP4801s1<0>
keith-a100-dev4:7180:7180 [1] NCCL INFO Using non-device net plugin version 0
keith-a100-dev4:7180:7180 [1] NCCL INFO Using network Socket
keith-a100-dev4:7180:7180 [1] NCCL INFO comm 0x646782dbed30 rank 1 nranks 2 cudaDev 1 nvmlDev 1 busId 200000 commId 0xe418f9626149e33e - Init START
keith-a100-dev4:7179:7179 [0] NCCL INFO comm 0x559af8995ca0 rank 0 nranks 2 cudaDev 0 nvmlDev 0 busId 100000 commId 0xe418f9626149e33e - Init START
keith-a100-dev4:7180:7180 [1] graph/xml.h:85 NCCL WARN Attribute busid of node nic not found
keith-a100-dev4:7180:7180 [1] NCCL INFO graph/xml.cc:589 -> 3
keith-a100-dev4:7180:7180 [1] NCCL INFO graph/xml.cc:806 -> 3
keith-a100-dev4:7180:7180 [1] NCCL INFO graph/topo.cc:689 -> 3
keith-a100-dev4:7180:7180 [1] NCCL INFO init.cc:881 -> 3
keith-a100-dev4:7180:7180 [1] NCCL INFO init.cc:1396 -> 3
keith-a100-dev4:7180:7180 [1] NCCL INFO init.cc:1641 -> 3
keith-a100-dev4:7179:7179 [0] graph/xml.h:85 NCCL WARN Attribute busid of node nic not found
keith-a100-dev4:7179:7179 [0] NCCL INFO graph/xml.cc:589 -> 3
keith-a100-dev4:7179:7179 [0] NCCL INFO graph/xml.cc:806 -> 3
keith-a100-dev4:7179:7179 [0] NCCL INFO graph/topo.cc:689 -> 3
keith-a100-dev4:7179:7179 [0] NCCL INFO init.cc:881 -> 3
keith-a100-dev4:7179:7179 [0] NCCL INFO init.cc:1396 -> 3
keith-a100-dev4:7179:7179 [0] NCCL INFO init.cc:1641 -> 3
keith-a100-dev4:7180:7180 [1] NCCL INFO init.cc:1679 -> 3
Failed, NCCL error /app/tensorrt_llm/cpp/tensorrt_llm/plugins/common/plugin.cpp:86 'internal error - please report this issue to the NCCL developers'
keith-a100-dev4:7179:7179 [0] NCCL INFO init.cc:1679 -> 3
Failed, NCCL error /app/tensorrt_llm/cpp/tensorrt_llm/plugins/common/plugin.cpp:86 'internal error - please report this issue to the NCCL developers'
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[12525,1],0]
Exit code: 1
### additional notes
I tried the same with gemma 7b (built with world_size=2) and the same thing happens. Wondering if it's related to running tritonserver with 2 a100s on a single machine.
System Info
CPU Architecture: AMD EPYC 7V13 64-Core Processor CPU/Host memory size: 440 GPU properties: A100 80Gb GPU name: NVIDIA A100 80GB x2 GPU mem size: 80Gb x 2 clock frequencies Libraries TensorRT-LLM branch or tag: main TensorRT-LLM commit: ae52bce3ed8ecea468a16483e0dacd3d156ae4fe Versions of TensorRT, CUDA: (10.0.1, 12.4) container used: Built container from tensorrtllm_backend main branch using dockerfile/Dockerfile.trt_llm_backend nvidia driver version: 535.161.07 OS: Ubuntu 22.04.4 LTS docker image version: custom built from main branch
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I built the trt llm container by running
I launched the container with this command
I built the smaug checkpoint and trt engine following the guide here. a. Instead of tp_size 8, I used tp_size 2 as I'm running with 2 gpus
I launch triton server with
I then tried running the sample script from the above link with
root@keith-a100-dev4:/home/tensorrtllm_backend# I0515 21:22:16.209472 7394 pinned_memory_manager.cc:275] Pinned memory pool is created at '0x75e75e000000' with size 268435456 I0515 21:22:16.214944 7394 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864 I0515 21:22:16.214957 7394 cuda_memory_manager.cc:107] CUDA memory pool is created on device 1 with size 67108864 I0515 21:22:16.497971 7394 model_lifecycle.cc:469] loading: smaug34b:1 [TensorRT-LLM][WARNING] gpu_device_ids is not specified, will be automatically set [TensorRT-LLM][WARNING] max_beam_width is not specified, will use default value of 1 [TensorRT-LLM][WARNING] iter_stats_max_iterations is not specified, will use default value of 1000 [TensorRT-LLM][WARNING] request_stats_max_iterations is not specified, will use default value of 0 [TensorRT-LLM][WARNING] normalize_log_probs is not specified, will be set to true [TensorRT-LLM][WARNING] max_tokens_in_paged_kv_cache is not specified, will use default value [TensorRT-LLM][WARNING] kv_cache_free_gpu_mem_fraction is not specified, will use default value of 0.9 or max_tokens_in_paged_kv_cache [TensorRT-LLM][WARNING] kv_cache_host_memory_bytes not set, defaulting to 0 [TensorRT-LLM][WARNING] kv_cache_onboard_blocks not set, defaulting to true [TensorRT-LLM][WARNING] max_attention_window_size is not specified, will use default value (i.e. max_sequence_length) [TensorRT-LLM][WARNING] sink_token_length is not specified, will use default value [TensorRT-LLM][WARNING] enable_kv_cache_reuse is not specified, will be set to false [TensorRT-LLM][WARNING] enable_chunked_context is not specified, will be set to false. [TensorRT-LLM][WARNING] lora_cache_max_adapter_size not set, defaulting to 64 [TensorRT-LLM][WARNING] lora_cache_optimal_adapter_size not set, defaulting to 8 [TensorRT-LLM][WARNING] lora_cache_gpu_memory_fraction not set, defaulting to 0.05 [TensorRT-LLM][WARNING] lora_cache_host_memory_bytes not set, defaulting to 1GB [TensorRT-LLM][WARNING] decoding_mode parameter is invalid or not specified(must be one of the {top_k, top_p, top_k_top_p, beam_search}).Using default: top_k_top_p if max_beam_width == 1, beam_search otherwise [TensorRT-LLM][WARNING] medusa_choices parameter is not specified. Will be using default mc_sim_7b_63 choices instead [TensorRT-LLM][INFO] Engine version 0.11.0.dev2024051400 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found [TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set. [TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found [TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found [TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set. [TensorRT-LLM][INFO] Initializing MPI with thread mode 3 [TensorRT-LLM][WARNING] gpu_device_ids is not specified, will be automatically set [TensorRT-LLM][WARNING] max_beam_width is not specified, will use default value of 1 [TensorRT-LLM][WARNING] iter_stats_max_iterations is not specified, will use default value of 1000 [TensorRT-LLM][WARNING] request_stats_max_iterations is not specified, will use default value of 0 [TensorRT-LLM][WARNING] normalize_log_probs is not specified, will be set to true [TensorRT-LLM][WARNING] max_tokens_in_paged_kv_cache is not specified, will use default value [TensorRT-LLM][WARNING] kv_cache_free_gpu_mem_fraction is not specified, will use default value of 0.9 or max_tokens_in_paged_kv_cache [TensorRT-LLM][WARNING] kv_cache_host_memory_bytes not set, defaulting to 0 [TensorRT-LLM][WARNING] kv_cache_onboard_blocks not set, defaulting to true [TensorRT-LLM][WARNING] max_attention_window_size is not specified, will use default value (i.e. max_sequence_length) [TensorRT-LLM][WARNING] sink_token_length is not specified, will use default value [TensorRT-LLM][WARNING] enable_kv_cache_reuse is not specified, will be set to false [TensorRT-LLM][WARNING] enable_chunked_context is not specified, will be set to false. [TensorRT-LLM][WARNING] lora_cache_max_adapter_size not set, defaulting to 64 [TensorRT-LLM][WARNING] lora_cache_optimal_adapter_size not set, defaulting to 8 [TensorRT-LLM][WARNING] lora_cache_gpu_memory_fraction not set, defaulting to 0.05 [TensorRT-LLM][WARNING] lora_cache_host_memory_bytes not set, defaulting to 1GB [TensorRT-LLM][WARNING] decoding_mode parameter is invalid or not specified(must be one of the {top_k, top_p, top_k_top_p, beam_search}).Using default: top_k_top_p if max_beam_width == 1, beam_search otherwise [TensorRT-LLM][WARNING] medusa_choices parameter is not specified. Will be using default mc_sim_7b_63 choices instead [TensorRT-LLM][INFO] Engine version 0.11.0.dev2024051400 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found [TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set. [TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found [TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found [TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set. [TensorRT-LLM][INFO] Initializing MPI with thread mode 3 [TensorRT-LLM][INFO] MPI size: 2, rank: 0 [TensorRT-LLM][INFO] MPI size: 2, rank: 1
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024051400 [TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024051400 [05/15/2024-21:14:21] [TRT-LLM] [I] Load tokenizer takes: 0.08764123916625977 sec [05/15/2024-21:14:21] [TRT-LLM] [I] Load tokenizer takes: 0.08883118629455566 sec [TensorRT-LLM][INFO] Engine version 0.11.0.dev2024051400 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found [TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set. [TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found [TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found [TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set. [TensorRT-LLM][INFO] Engine version 0.11.0.dev2024051400 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found [TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set. [TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found [TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found [TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set. [TensorRT-LLM][INFO] MPI size: 2, rank: 0 [TensorRT-LLM][INFO] Engine version 0.11.0.dev2024051400 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found [TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set. [TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found [TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found [TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set. [TensorRT-LLM][INFO] Engine version 0.11.0.dev2024051400 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found [TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set. [TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found [TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found [TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set. [TensorRT-LLM][INFO] Engine version 0.11.0.dev2024051400 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found [TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set. [TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found [TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found [TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set. [TensorRT-LLM][INFO] MPI size: 2, rank: 1 [TensorRT-LLM][INFO] Engine version 0.11.0.dev2024051400 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found [TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set. [TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found [TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found [TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set. [TensorRT-LLM][INFO] MPI size: 2, rank: 1 [TensorRT-LLM][INFO] MPI size: 2, rank: 0 [TensorRT-LLM][INFO] Rank 0 is using GPU 0 [TensorRT-LLM][INFO] TRTGptModel maxNumSequences: 32 [TensorRT-LLM][INFO] TRTGptModel maxBatchSize: 32 [TensorRT-LLM][INFO] TRTGptModel mMaxAttentionWindowSize: 2048 [TensorRT-LLM][INFO] TRTGptModel enableTrtOverlap: 0 [TensorRT-LLM][INFO] TRTGptModel normalizeLogProbs: 1 [TensorRT-LLM][INFO] Loaded engine size: 33338 MiB [TensorRT-LLM][INFO] Rank 1 is using GPU 1 [TensorRT-LLM][INFO] TRTGptModel maxNumSequences: 32 [TensorRT-LLM][INFO] TRTGptModel maxBatchSize: 32 [TensorRT-LLM][INFO] TRTGptModel mMaxAttentionWindowSize: 2048 [TensorRT-LLM][INFO] TRTGptModel enableTrtOverlap: 0 [TensorRT-LLM][INFO] TRTGptModel normalizeLogProbs: 1 [TensorRT-LLM][INFO] Loaded engine size: 33338 MiB keith-a100-dev4:7179:7179 [0] NCCL INFO Bootstrap : Using eth0:10.5.0.15<0> keith-a100-dev4:7179:7179 [0] NCCL INFO cudaDriverVersion 12040 NCCL version 2.19.3+cuda12.3 keith-a100-dev4:7179:7179 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so keith-a100-dev4:7179:7179 [0] NCCL INFO P2P plugin IBext_v7 keith-a100-dev4:7179:7179 [0] NCCL INFO NET/IB : No device found. keith-a100-dev4:7179:7179 [0] NCCL INFO NET/IB : No device found. keith-a100-dev4:7179:7179 [0] NCCL INFO NET/Socket : Using [0]eth0:10.5.0.15<0> [1]enP4801s1:fe80::7e1e:52ff:fe22:721d%enP4801s1<0> keith-a100-dev4:7179:7179 [0] NCCL INFO Using non-device net plugin version 0 keith-a100-dev4:7179:7179 [0] NCCL INFO Using network Socket keith-a100-dev4:7180:7180 [1] NCCL INFO cudaDriverVersion 12040 keith-a100-dev4:7180:7180 [1] NCCL INFO Bootstrap : Using eth0:10.5.0.15<0> keith-a100-dev4:7180:7180 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so keith-a100-dev4:7180:7180 [1] NCCL INFO P2P plugin IBext_v7 keith-a100-dev4:7180:7180 [1] NCCL INFO NET/IB : No device found. keith-a100-dev4:7180:7180 [1] NCCL INFO NET/IB : No device found. keith-a100-dev4:7180:7180 [1] NCCL INFO NET/Socket : Using [0]eth0:10.5.0.15<0> [1]enP4801s1:fe80::7e1e:52ff:fe22:721d%enP4801s1<0> keith-a100-dev4:7180:7180 [1] NCCL INFO Using non-device net plugin version 0 keith-a100-dev4:7180:7180 [1] NCCL INFO Using network Socket keith-a100-dev4:7180:7180 [1] NCCL INFO comm 0x646782dbed30 rank 1 nranks 2 cudaDev 1 nvmlDev 1 busId 200000 commId 0xe418f9626149e33e - Init START keith-a100-dev4:7179:7179 [0] NCCL INFO comm 0x559af8995ca0 rank 0 nranks 2 cudaDev 0 nvmlDev 0 busId 100000 commId 0xe418f9626149e33e - Init START
keith-a100-dev4:7180:7180 [1] graph/xml.h:85 NCCL WARN Attribute busid of node nic not found keith-a100-dev4:7180:7180 [1] NCCL INFO graph/xml.cc:589 -> 3 keith-a100-dev4:7180:7180 [1] NCCL INFO graph/xml.cc:806 -> 3 keith-a100-dev4:7180:7180 [1] NCCL INFO graph/topo.cc:689 -> 3 keith-a100-dev4:7180:7180 [1] NCCL INFO init.cc:881 -> 3 keith-a100-dev4:7180:7180 [1] NCCL INFO init.cc:1396 -> 3 keith-a100-dev4:7180:7180 [1] NCCL INFO init.cc:1641 -> 3
keith-a100-dev4:7179:7179 [0] graph/xml.h:85 NCCL WARN Attribute busid of node nic not found keith-a100-dev4:7179:7179 [0] NCCL INFO graph/xml.cc:589 -> 3 keith-a100-dev4:7179:7179 [0] NCCL INFO graph/xml.cc:806 -> 3 keith-a100-dev4:7179:7179 [0] NCCL INFO graph/topo.cc:689 -> 3 keith-a100-dev4:7179:7179 [0] NCCL INFO init.cc:881 -> 3 keith-a100-dev4:7179:7179 [0] NCCL INFO init.cc:1396 -> 3 keith-a100-dev4:7179:7179 [0] NCCL INFO init.cc:1641 -> 3 keith-a100-dev4:7180:7180 [1] NCCL INFO init.cc:1679 -> 3 Failed, NCCL error /app/tensorrt_llm/cpp/tensorrt_llm/plugins/common/plugin.cpp:86 'internal error - please report this issue to the NCCL developers' keith-a100-dev4:7179:7179 [0] NCCL INFO init.cc:1679 -> 3 Failed, NCCL error /app/tensorrt_llm/cpp/tensorrt_llm/plugins/common/plugin.cpp:86 'internal error - please report this issue to the NCCL developers'
Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.
mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:
Process name: [[12525,1],0] Exit code: 1