[Feature] how to use kDLCUDAHost in dlpack.h

open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework

https://mmdeploy.readthedocs.io/en/latest/

Apache License 2.0

2.76k stars 632 forks source link

[Feature] how to use kDLCUDAHost in dlpack.h #1592

Open kelvinwang139 opened 1 year ago

kelvinwang139 commented 1 year ago

Motivation

want to improve the image copy rate from CPU to GPU, how to use the feature defined in th dlpact.h

/! \brief Pinned CUDA CPU memory by cudaMallocHost */ kDLCUDAHost = 3,

Related resources

No response

Additional context

No response

lzhangzz commented 1 year ago

@kelvinwang139 Can you provide more detail on how are you using MMDeploy?

kelvinwang139 commented 1 year ago

Our real requirement is how to improve the copy rate of one image from CPU to GPU. Currently we could find the mmdeploy transfer the cv:mat to mat, one image（600 * 600） need 10ms (RTX 3090)，we have about 80 images that means 0.8s. Is any way to reduce the time? we have CT time limitation and try to reduce any possible time cost.

we check the opencvsharp bitmap to cvmat, for 600 * 600 image, only need 1ms。

kelvinwang139 commented 1 year ago

In addition MMDeploy the mat is same with gpumat of opencvshart?

kelvinwang139 commented 1 year ago

Or how to convert bitmap to mat directly?
Current we run: 1、bitmap to cvmat (6ms) 2、cvmat to mat (10ms)

kelvinwang139 commented 1 year ago

Or how to convert bitmap to mat directly?
Current we run: 1、bitmap to cvmat (6ms) 2、cvmat to mat (10ms)

lzhangzz commented 1 year ago

Take a look at the definition of mmdeploy_mat_t

typedef struct mmdeploy_mat_t {
  uint8_t* data;
  int height;
  int width;
  int channel;
  mmdeploy_pixel_format_t format;
  mmdeploy_data_type_t type;
  mmdeploy_device_t device;
} mmdeploy_mat_t;

It it just a reference to the underlying image data with information like height/width/format set. If your data is located on pinned memory or even GPU it can use it directly without copying.

For pinned memory, set device to nullptr (which is short for "CPU device").
For CUDA memory, set device to mmdeploy_device_t object created by mmdeploy_device_create.

kelvinwang139 commented 1 year ago

Take a look at the definition of mmdeploy_mat_t
typedef struct mmdeploy_mat_t {
  uint8_t* data;
  int height;
  int width;
  int channel;
  mmdeploy_pixel_format_t format;
  mmdeploy_data_type_t type;
  mmdeploy_device_t device;
} mmdeploy_mat_t;
It it just a reference to the underlying image data with information like height/width/format set. If your data is located on pinned memory or even GPU it can use it directly without copying.

For pinned memory, set device to nullptr (which is short for "CPU device").

For CUDA memory, set device to mmdeploy_device_t object created by mmdeploy_device_create.

Thanks for your response. But not sure the mean for the description of "for pinned memory ... and for CUDA memory ...." We are using the C# SDK (segmentation & classification)， could you provide the related instructions or example for how to set them?

lvhan028 commented 1 year ago

@irexyc could you help with an example?