CIFAR10 example 用 GPU 的话需要在 net.to(device) 后重新定义一次 optimizer
net = net.to(device)
optimizer = optim.SGD(net.parameters(), lr=0.0001, momentum=0.9)
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs
inputs, labels = data[0].to(device), data[1].to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
Reference
Brief
torch.optim
Optimizer
Adadelta Adagrad Adam AdamW SparseAdam Adamax ASGD LBFGS RMSprop Rprop SGD
Learning Rate Scheduler
Tips
CIFAR10 example 用 GPU 的话需要在
net.to(device)
后重新定义一次optimizer
.cuda()
之后