How to increase num_workers in pytorch DataLoader?

Hi, I am following the scRNA-seq tutorial.

After running

model = scvi.model.SCVI(adata, n_layers=2, n_latent=30, gene_likelihood="nb")
model.train()

I got

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/winglet/Data/apps/micromamba/envs/scanpy/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=87` in the `DataLoader` to improve performance.

When I set this at the very beginning

scvi.settings.dl_num_workers = 87

I got tons of

/home/winglet/Data/apps/micromamba/envs/scanpy/lib/python3.9/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
self.pid = os.fork()

and the training speed was much slower than the previous one (time/iter increaesd 5x).

Versions:

scvi-tools-1.1.5

scverse / scvi-tools

How to increase num_workers in pytorch DataLoader? #2933

Versions: