Open shanzha9 opened 2 months ago
There is multi-GPU support. However, your calculation looks vastly off and I don't think your dataset is large enough to benefit largely from multi-GPU. E.g. on the human CELLXGENE census dataset of 35 million cells, training for 100 epochs took less than 2 days (we actually had strikingly similar results after a couple of hours and 20 epochs). I would recommend increasing batch_size to 1024 (time scales pretty linearly with batch size) and reducing train epochs to 50 (it's what I am usually using) and you should be able to train it in 2 hours. Let me know if it takes more than 8 hours (installation might be wrong or your object is not correctly formatted).
@canergen Hi, thanks for your reply.
Could you please link the official multi gpus tutorial. The data to be train is raw counts. I set
scvi.settings.dl_num_workers = 30 scvi.settings.num_threads = 30 scvi.settings.batch_size = 2048
but, the %cpu only under 20% per process. There may somethingwrong with dataloader?
Hi, developers,
The datasets used in my study consisting of 2.6 M cells, so it will take a week to train. I wonder if scvi tools support multi gpus train and if there is an official tutorial.
Thank you!