Open hygxy opened 2 years ago
tv::from_blob
to create tv::Tensor
from tensorrt pointers.
// assume you record tensor metas in configure function
auto out1_ten = tv::from_blob(out1_shape, out1_dtype, 0 /*GPU*/);
You need to use data_ptr
to get points instead of c-style cast:
auto ptr = out1_ten.data_ptr<const __half>(); // if ptr in from_blob is const, you must use const dtype to get const pointer
out_inds
should be a input of a conv layer.since the enqueue
API has the interface int enqueue(...,void* const *outputs,...)
, if I implement as follows:
outputs[0] = out_features.data_ptr<const float>();
outputs[1] = out_inds.data_ptr<const int32_t>();
I get the following compilation error, even with const float
and const int32_t
error: assignment of read-only location ‘* outputs’ outputs[0] = out_features.data_ptr<const float>();
error: assignment of read-only location ‘*(outputs + 8)’ outputs[1] = out_inds.data_ptr<const int32_t>();
tensorrt outputs are allocated once when engine created, so you don't need to assign a new ptr, just use outputs to create tv::Tensor
by tv::from_blob
you should check tensorrt plugin tutorial or example first (IPluginV2DynamicExt), you can write a simple plugin to test all plugin functions.
I have now another question: should the pointer passed to from_blob() as the first parameter be aligned? if yes, how many bytes should it be?
I calculated the workspace size(originally in main.cu :int workspace_size = SpconvOps::get_indice_gen_workspace_size)
in the getWorkspaceSize
API that every tensorrt plugin must implement. Then I replace all tv::empty()
with tv::from_blob()
and self-increment the workspace
pointer. For example:
//tv::Tensor pair_fwd_padded = tv::empty({KV, pair_fwd_size_padded}, tv::int32, 0);
tv::Tensor pair_fwd_padded = tv::from_blob(workspace, {KV, pair_fwd_size_padded}, tv::int32, 0);
workspace += KV * pair_fwd_size_padded * sizeof(tv::int32);
However, I get the following error:
error: 1: [defaultAllocator.cpp::deallocate::35] Error Code 1: Cuda Runtime (misaligned address)
error: 1: [cudaResources.cpp::~ScopedCudaStream::47] Error Code 1: Cuda Runtime (misaligned address)
error: 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (misaligned address)
error: 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (misaligned address)
error: 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (misaligned address)
error: 1: [defaultAllocator.cpp::deallocate::35] Error Code 1: Cuda Runtime (misaligned address)
get_indice_gen_tensors_from_workspace
instead of manage workspace by your self.hash_size = int({SPCONV_DIRECT_TABLE_HASH_SIZE_SCALE} * max_act_out_in_theory);
in get_indice_gen_workspace_size
and get_indice_gen_tensors_from_workspace
to hash_size = tv::align_up(int({SPCONV_DIRECT_TABLE_HASH_SIZE_SCALE} * max_act_out_in_theory), 2);
to avoid this problem.Thanks for your advice!
I noticed that convolution with kernel shape 1*1*1
is not supported in libspconv yet. (while in the python version implemented with torch.mm already), is that correct?
If yes, which implementation do you suggest when implementing "torch.mm" in C++/cuda?
if you use tensorrt, you can use matmul layer in tensorrt (skip sparse conv when construct tensorrt network), no need to implement a new one.
By the way, torch use cuBLAS to implement torch.mm
. if you need mm in c++, you can implement it by cuBLASLt, here is official example
Thanks for your advice! I noticed that convolution with kernel shape
1*1*1
is not supported in libspconv yet. (while in the python version implemented with torch.mm already), is that correct?If yes, which implementation do you suggest when implementing "torch.mm" in C++/cuda?
Hi, May I ask how do you convert plugin in inputs to tv tensor? Do I need to cast it in any way?
// static_num_act_in is just out_inds_num_limit of previous conv layer. // for regular conv, the input tensor has static shape, we should save a CPU // variable of real num_act_out. here we just use num_act_in. int real_num_act_in = real_num_voxels;
@FindDefinition how do I can pass CPU variable real_num_act_in from one tensorrt layer to another?
I am trying to implement a tensorRT plugin for a single sparse convolution layer and need to write the convolution result back to output pointer defined in the
enqueue()
interface from TensorRT. Right now, I am doing it this way:where
outputs
is defined in the interface of theenqueue
API from TensorRT,out_features
andout_inds
are just variables defined inmain.cu
. It turns out bothout1
andout2
contains all zero values even if I copy them back to CPU usingcudaMemcpyAsync()
followed bycudaStreamSynchronize()
.What´s the correct way of converting results in tv::Tensor format to tensorRT plugin format?