Many thanks for this great this repo! It's an amazing and useful work, that I am learning from quite a bit.
I wanted to check if there's an optimization reason why you chose to call cudaMemcpyAsync directly in mode.cpp rather than context->enqueueV2 as written in the documentation .
I am still relatively new to deploying models in C++. Was your choice an optimization choice? Or a just a personal coding style?
If I understand correctly, enqueueV2 is just a wrapper around the cuda memcpy, and so wouldn't using the enqueueV2 or executeV2 be more maintainable long term, since as the tensor-rt changes they could potentially change the implementation but keep that method same signature.
Many thanks for this great this repo! It's an amazing and useful work, that I am learning from quite a bit.
I wanted to check if there's an optimization reason why you chose to call
cudaMemcpyAsync
directly in mode.cpp rather thancontext->enqueueV2
as written in the documentation .I am still relatively new to deploying models in C++. Was your choice an optimization choice? Or a just a personal coding style?
If I understand correctly,
enqueueV2
is just a wrapper around the cuda memcpy, and so wouldn't using theenqueueV2
orexecuteV2
be more maintainable long term, since as thetensor-rt
changes they could potentially change the implementation but keep that method same signature.