Closed ashim-mahara closed 3 years ago
Did you set "num_processes" in your config to 2?
I noticed, that when this parameter is set to 1 only one gpu will be used.
My working configuration is this:
{
"distributed_type": "MULTI_GPU",
"fp16": true,
"machine_rank": 0,
"main_process_ip": null,
"main_process_port": null,
"main_training_function": "main",
"num_machines": 1,
"num_processes": 2
}
Could you please share the command you are using to launch your script? That would help debug your problem. Thanks!
I am using this with JupyterLab, is it only working while training with scripts or while using the CLI tool?
Yes, the accelerate library currently only supports launching training scripts. Notebook launchers are on the roadmap, but not implemented yet.
Hi- Thanks for the great library, Sylvain! Not to hijack the thread, but I am having the same problem. (Happy to create a new issue, though).
The config file looks as follows:
compute_environment: LOCAL_MACHINE
distributed_type: MULTI_GPU
fp16: true
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
num_machines: 1
num_processes: 2
The relevant part of the code is as follows:
accelerator = Accelerator(fp16=config['fp16'], cpu=config['cpu'])
print(accelerator.device)
# Sample hyper-parameters for learning rate, batch size, seed and a few other HPs
lr = config["lr"]
num_epochs = int(config["num_epochs"])
seed = int(config["seed"])
batch_size = int(config["batch_size"])
# If the batch size is too big we use gradient accumulation
gradient_accumulation_steps = 1
if batch_size > MAX_GPU_BATCH_SIZE:
gradient_accumulation_steps = batch_size // MAX_GPU_BATCH_SIZE
batch_size = MAX_GPU_BATCH_SIZE
# Instantiate dataloaders.
train_dataloader = DataLoader(
train_dataset, shuffle=True, collate_fn=collate_fn, batch_size=batch_size
)
valid_dataloader = DataLoader(
validation_dataset, shuffle=False, collate_fn=collate_fn, batch_size=EVAL_BATCH_SIZE
)
test_dataloader = DataLoader(
test_dataset, shuffle=False, collate_fn=collate_fn, batch_size=EVAL_BATCH_SIZE
)
# Instantiate the model (we build the model here so that the seed also control new weights initialization)
model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-960h")
# Instantiate optimizer
optimizer = AdamW(params=model.parameters(), lr=lr)
# Prepare everything
# There is no specific order to remember, we just need to unpack the objects in the same order we gave them to the
# prepare method.
prepared = accelerator.prepare(
model, optimizer, train_dataloader, valid_dataloader, test_dataloader
)
model, optimizer, train_dataloader, valid_dataloader, test_dataloader = prepared
# Now we train the model
for epoch in range(num_epochs):
model.train()
for step, batch in enumerate(train_dataloader):
# We could avoid this line since we set the accelerator with `device_placement=True`.
#batch.to(accelerator.device)
outputs = model(**batch)
loss = outputs.loss
loss = loss / gradient_accumulation_steps
accelerator.backward(loss)
if step % gradient_accumulation_steps == 0:
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
The script utilizes a single GPU, though there are 2 GPUS.
>>> torch.cuda.device_count()
2
Launching the scipt in the command line:
accelerate launch training.py
Any help is appreciated. Thank you!
Yes, a new issue would be cleaner as this one is about accelerate not working out of thee box in a Jupyter environment :-)
Could you report in your issue what gets printed (since you have a print(accelerator.device)
in your script)? Thanks!
Should I close this issue then?
I'll work on a notebook launcher in the coming days. You can close the issue now or when it's ready, as you prefer :-)
While using Accelerate, it is only utilizing 1 out of the 2 GPUs present. I am training using the general instructions in the repository. The architecture is AutoEncoder.
I am transferring the samples in the batch to the device using the code below:
The device is being determined by using:
device = accelerator.device
Both devices are visible which can be confirmed by using
torch.cuda.device_count()
which returns 2.Devices are RTX 2080 with CUDA Version 11.2. Driver version is 460.67. Distro is PopOS!.