triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8k stars 1.44k forks source link

Unable to find shared memory region: 'encoder_input0' #6112

Open busishengui opened 1 year ago

busishengui commented 1 year ago

Description A clear and concise description of what the bug is. When I use the shared memory there is an error

Triton Information What version of Triton are you using? 22.12 Are you using the Triton container or did you build it yourself? docker To Reproduce Steps to reproduce the behavior.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

Expected behavior A clear and concise description of what you expected to happen.

jbkyang-nvi commented 1 year ago

Hi @busishengui can you explain a little bit about how you are setting the shared memory region in your client and how you are sending this information to the server?

busishengui commented 1 year ago

Hi @busishengui can you explain a little bit about how you are setting the shared memory region in your client and how you are sending this information to the server?

`std::string encoder_input_shm_key = "/encoder_input" + std::to_string(threadidx); int encoder_shm_fd_ip = threadidx 100 +2; void encoder_input_shm; size_t encoder_input_byte_size = feats.size() sizeof(float) + 2 sizeof(int64_t); tc::CreateSharedMemoryRegion(encoder_input_shm_key, encoder_input_byte_size, &encoder_shm_fd_ip); tc::MapSharedMemory( encoder_shm_fd_ip, 0, encoder_input_byte_size, (void*)&encoder_input_shm); tc::CloseSharedMemory(encoder_shm_fd_ip); memcpy(encoder_input_shm, feats.data(), feats.size() sizeof(float)); memcpy(encoder_input_shm + feats.size() sizeof(float), &chunklens, sizeof(int64_t)); memcpy(encoder_input_shm + feats.size() sizeof(float) + sizeof(int64_t), &required_cache_size, sizeof(int64_t)); // LOG(INFO) << "memcpy cost time is " << asa.Elapsed(); std::string shm_input_name = "encoder_input" + std::to_string(threadidx); client->RegisterSystemSharedMemory( shm_input_name, encoder_input_shm_key, encoder_input_byte_size); encoder_input_ptr->SetSharedMemory(shm_input_name, feats.size() sizeof(float), 0); chunk_lens_ptr->SetSharedMemory(shm_input_name, sizeof(int64_t), feats.size() sizeof(float)); required_cache_size_ptr->SetSharedMemory(shm_input_name, sizeof(int64_t), feats.size() * sizeof(float) + sizeof(int64_t));

std::vector<tc::InferInput> encoder_input_list = {encoder_input_ptr.get(), chunk_lens_ptr.get(), required_cache_size_ptr.get()}; tc::InferRequestedOutput encoder_output; tc::InferRequestedOutput::Create(&encoder_output, "output"); std::shared_ptr encoder_output_ptr; encoder_output_ptr.reset(encoder_output); std::string encoder_output_shm_key = "/encoder_output" + std::to_string(threadidx); int encoder_shm_fd_op = threadidx 100 +100; float encoder_output_shm; size_t encoder_output_byte_size = sizeof(float) chunk_size attentiondim; tc::CreateSharedMemoryRegion(encoder_output_shm_key, encoder_output_byte_size, &encoder_shm_fd_op); tc::MapSharedMemory( encoder_shm_fd_op, 0, encoder_output_byte_size, (void*)&encoder_output_shm); tc::CloseSharedMemory(encoder_shm_fd_op); std::string shm_output_name = "encoder_output" + std::to_string(threadidx); client->RegisterSystemSharedMemory(shm_output_name, encoder_output_shm_key, encoder_output_byte_size); encoder_output_ptr->SetSharedMemory( shm_output_name, encoder_output_byte_size, 0 / offset /); std::vector<const tc::InferRequestedOutput> encoder_outputs = {encoder_output_ptr.get()}; tc::InferResult* encoder_results; tc::InferOptions encoder_options("encoder_model"); encoder_options.sequenceid = threadidx + 10086; encoder_options.sequencestart = !start_ ? true : false;

encoder_options.sequenceend = state == DecodeState::kEndFeats ? true : false; encoder_options.requestid = std::to_string(threadidx);

client->Infer( &encoder_results, encoder_options, encoder_input_list, encoder_outputs, http_headers, tc::Parameters(), request_compression_algorithm, response_compression_algorithm);``