jamesdolezal / slideflow

Deep learning library for digital pathology, with both Tensorflow and PyTorch support.
https://slideflow.dev
GNU General Public License v3.0
230 stars 38 forks source link

Multiple GPU Acceleration #354

Closed NaokiThread closed 3 months ago

NaokiThread commented 3 months ago

Feature

I fervently multiple GPU acceleration.

Pitch

Is it possible to use multiple GPUs when doing MIL? In short, can train_mil be fastened with multiple GPU? It seems in the process only one core is used.

jamesdolezal commented 3 months ago

Hi! This feature was added a few months back. You can accelerate feature bag generation across multiple gpus with the num_gpus argument. E.g.:

P.generate_feature_bags(..., num_gpus=4)

I'm realizing now that this hasn't been added to the documentation, so we'll be sure to update accordingly!

NaokiThread commented 3 months ago

Thank you! Is it possible to use multiple gpu on train_mil?

jamesdolezal commented 3 months ago

We haven't found a use case for multi-GPU training for MIL models, as MIL model training is very fast (typically only a few seconds per epoch). The marginal benefits of distributed training in this setting probably wouldn't offset the added overhead of spinning up multiple processes.

However, if you have identified a scenario in which you are significantly bottlenecked by GPU utilization when training MIL models, let us know! If so, it would be helpful to know the model architecture, dataset size, what metrics you've investigated to determine that the bottleneck is GPU utilization.

NaokiThread commented 3 months ago

Thank you so much!!!