Using cudaMemcpyAsync directly rather than context->enqueueV2

Many thanks for this great this repo! It's an amazing and useful work, that I am learning from quite a bit.

I wanted to check if there's an optimization reason why you chose to call cudaMemcpyAsync directly in mode.cpp rather than context->enqueueV2 as written in the documentation .

I am still relatively new to deploying models in C++. Was your choice an optimization choice? Or a just a personal coding style?

If I understand correctly, enqueueV2 is just a wrapper around the cuda memcpy, and so wouldn't using the enqueueV2 or executeV2 be more maintainable long term, since as the tensor-rt changes they could potentially change the implementation but keep that method same signature.

linghu8812 / tensorrt_inference

Using cudaMemcpyAsync directly rather than context->enqueueV2 #150