[Feature] Enhancement Request: Expand Multi-GPU Inference Functionality

open-mmlab / mmengine

OpenMMLab Foundational Library for Training Deep Learning Models

https://mmengine.readthedocs.io/

Apache License 2.0

1.18k stars 355 forks source link

[Feature] Enhancement Request: Expand Multi-GPU Inference Functionality #1167

Open 330205812 opened 1 year ago

330205812 commented 1 year ago

What is the feature?

Hello MMEngine Community,

I hope this message finds you well. I am writing to propose an enhancement to MMEngine that I believe would significantly improve its usability and performance, particularly in terms of multi-gpu inference.

As increasingly complex models and larger datasets are being used, the demand for distributed and parallel computing resources is higher than ever. Being able to fully utilize multi-gpu setups for inference would undoubtedly boost MMEngine's performance and scalability.

I would be grateful if the team could consider this request and provide any feedback or updates regarding potential implementation. I am also curious to know if other users in the community have had similar experiences or thoughts on this.

Thank you in advance for your time and consideration.

Best regards, [Tree]

Any other context?

No response

HAOCHENYE commented 1 year ago

Thank you for your suggestion. Distributed inference would indeed be a cool feature for MMEngine! However, before proceeding, I would like to clarify whether you are referring to multi-GPU inference using distributed data parallelism (DDP, FSDP) for speeding up inference, or if you are referring to large model inference using a model pipeline. Considering the differences in functionality between the two, the implementation approaches would also differ. Could you please provide some insights into which functionality you would like to support and discuss possible design approaches?

Additionally, it should be mentioned that MMEngine has already implemented BaseInferencer. Our design should revolve around it.

330205812 commented 1 year ago

Hello @HAOCHENYE ,

Thank you for your response. I would like to clarify my issue, I am discussing how to use Distributed Data Parallel (DDP) and Fully Sharded Data Parallel (FSDP) to accelerate multi-GPU inference. My suggestion is that in the init function of the BaseInferencer class, if the device parameter receives 'cuda:0,1' (or a similar format), then we can initiate the multi-GPU inference feature.

I hope this suggestion can be of some help. If there are any questions or further discussions needed, I am more than happy to assist. Thank you again for your time and effort.

Best regards, Tree

HAOCHENYE commented 1 year ago

Got it. So, the tasks at hand are how to build a model wrapper within the Inferencer and how to perform distributed sampling on the processed data, right? If possible, it would be great if you could share your design on this issue.