Open kelvinwang139 opened 1 year ago
@kelvinwang139 Can you provide more detail on how are you using MMDeploy?
Our real requirement is how to improve the copy rate of one image from CPU to GPU. Currently we could find the mmdeploy transfer the cv:mat to mat, one image(600 * 600) need 10ms (RTX 3090),we have about 80 images that means 0.8s. Is any way to reduce the time? we have CT time limitation and try to reduce any possible time cost.
we check the opencvsharp bitmap to cvmat, for 600 * 600 image, only need 1ms。
In addition MMDeploy the mat is same with gpumat of opencvshart?
Or how to convert bitmap to mat directly?
Current we run:
1、bitmap to cvmat (6ms)
2、cvmat to mat (10ms)
Or how to convert bitmap to mat directly?
Current we run:
1、bitmap to cvmat (6ms)
2、cvmat to mat (10ms)
Take a look at the definition of mmdeploy_mat_t
typedef struct mmdeploy_mat_t {
uint8_t* data;
int height;
int width;
int channel;
mmdeploy_pixel_format_t format;
mmdeploy_data_type_t type;
mmdeploy_device_t device;
} mmdeploy_mat_t;
It it just a reference to the underlying image data with information like height/width/format set. If your data is located on pinned memory or even GPU it can use it directly without copying.
device
to nullptr
(which is short for "CPU device").device
to mmdeploy_device_t
object created by mmdeploy_device_create
.Take a look at the definition of
mmdeploy_mat_t
typedef struct mmdeploy_mat_t { uint8_t* data; int height; int width; int channel; mmdeploy_pixel_format_t format; mmdeploy_data_type_t type; mmdeploy_device_t device; } mmdeploy_mat_t;
It it just a reference to the underlying image data with information like height/width/format set. If your data is located on pinned memory or even GPU it can use it directly without copying.
- For pinned memory, set
device
tonullptr
(which is short for "CPU device").- For CUDA memory, set
device
tommdeploy_device_t
object created bymmdeploy_device_create
.
Thanks for your response. But not sure the mean for the description of "for pinned memory ... and for CUDA memory ...." We are using the C# SDK (segmentation & classification), could you provide the related instructions or example for how to set them?
@irexyc could you help with an example?
Motivation
want to improve the image copy rate from CPU to GPU, how to use the feature defined in th dlpact.h
/! \brief Pinned CUDA CPU memory by cudaMallocHost */ kDLCUDAHost = 3,
Related resources
No response
Additional context
No response