Error training on GPU backends using Torch Connector

ktasha45 commented 2 years ago

Environment

Qiskit Machine Learning version: 0.3.0
Python version: 3.9.7
Operating system: Windows 10

What is happening?

Torch connector seems to expect all the data to be in the CPU and fails when its run on a GPU. Using the qiskit-aer-gpu backends does not change the outcome. Since the function is in the library, we cannot control its output

RuntimeError                              Traceback (most recent call last)
<ipython-input-19-b69affc5324d> in <module>()
      5     for batch_idx, (data, target) in enumerate(test_loader):
      6         data = data[:, 0, :, :].reshape(-1, 1, 28, 28)
----> 7         output = model4(data)
      8         if len(output.shape) == 1:
      9             output = output.reshape(1, *output.shape)

/usr/local/lib/python3.7/dist-packages/qiskit_machine_learning/connectors/torch_connector.py in forward(ctx, input_data, weights, neural_network, sparse) 

    102             ctx.sparse = sparse
    103             ctx.save_for_backward(input_data, weights)
--> 104             result = neural_network.forward(input_data.numpy(), weights.numpy())
    105             if neural_network.sparse and sparse:
    106                 if not _HAS_SPARSE:

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

How can we reproduce the issue?

Imports and set GPU

import numpy as np
import matplotlib.pyplot as plt

import torch
from torch.autograd import Function
from torchvision import datasets, transforms
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F

import qiskit
from qiskit import transpile, assemble, Aer
from qiskit.visualization import *

import matplotlib.pyplot as plt
from torchvision.transforms.functional import to_pil_image

from pathlib import Path
import numpy as np
import pandas as pd
from PIL import Image
import sklearn
from sklearn.model_selection import train_test_split

import torchvision
from torchvision import transforms
from torch.utils.data import DataLoader
import torch.nn.functional as F

import matplotlib.pyplot as plt
from torchvision.transforms.functional import to_pil_image

import numpy as np
import matplotlib.pyplot as plt
import cv2

from torch import Tensor
from torch.nn import Linear, CrossEntropyLoss, MSELoss
from torch import cat, no_grad, manual_seed
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import torch.optim as optim
from torch.nn import (Module,Conv2d,Linear,Dropout2d,NLLLoss,MaxPool2d,Flatten,Sequential,ReLU,CrossEntropyLoss
)

from qiskit import Aer, QuantumCircuit
from qiskit.utils import QuantumInstance, algorithm_globals
from qiskit.opflow import AerPauliExpectation
from qiskit.circuit import Parameter
from qiskit.circuit.library import RealAmplitudes, ZZFeatureMap
from qiskit_machine_learning.neural_networks import CircuitQNN, TwoLayerQNN
from qiskit_machine_learning.connectors import TorchConnector

from qiskit.providers.aer import AerError
if torch.cuda.is_available():
    DEVICE = torch.device('cuda')
else:
    DEVICE = torch.device('cpu')

Set Simulator

simulator_gpu = Aer.get_backend('aer_simulator_statevector')
simulator_gpu.set_options(device=DEVICE)

Set Dataset

train_dataset = torchvision.datasets.ImageFolder(root = "Enter path",transform = torchvision.transforms.ToTensor())
test_dataset = torchvision.datasets.ImageFolder(root = "Enter path",transform = torchvision.transforms.ToTensor())

train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=128, shuffle=True)

train_iter = iter(train_loader)
images, labels = train_iter.next()
labels = F.one_hot(labels % 30)

Set QNN

qi = QuantumInstance(simulator_gpu)
feature_map = ZZFeatureMap(feature_dimension=2, reps=3, entanglement='circular')
ansatz = RealAmplitudes(2, reps=1)
# REMEMBER TO SET input_gradients=True FOR ENABLING HYBRID GRADIENT BACKPROP
qnn4 = TwoLayerQNN(
    2, feature_map, ansatz, input_gradients=True, exp_val=AerPauliExpectation(), quantum_instance=qi
)

Sample random Net function

class Net(Module):
    def __init__(self):
        super().__init__()
        self.conv1 = Conv2d(1, 5, kernel_size=5) # [none, 3, 24, 24] // 28, 28 > 24, 24
        self.conv2 = Conv2d(5, 16, kernel_size=5) # 16 8 8
        self.dropout = Dropout2d()
        self.fc1 = nn.Linear(256, 64) # 16 * 16 = 256
        self.fc2 = nn.Linear(64, 2)
        self.qnn = TorchConnector(qnn4)  # Apply torch connector, weights chosen
        self.fc3 = Linear(1, 30)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2) # 5 3 12 12
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2) # 16 4 4
        x = self.dropout(x)
        x = x.view(x.shape[0], -1)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        x = self.qnn(x)  # apply QNN
        x = F.softmax(self.fc3(x))
        return x

model4 = Net().to(DEVICE)

Training Start [Error in this block]

epochs = 300  # Set number of epochs
loss_list = []  # Store loss history
model4.train().to(DEVICE)  # Set model to training mode

loss_func = nn.CrossEntropyLoss()
# optimizer
optimizer = optim.Adam(model4.parameters(), lr=1e-3)
#scheduler
scheduler = optim.lr_scheduler.LambdaLR(optimizer=optimizer,
                                        lr_lambda=lambda epoch: 0.95 ** epoch,
                                        last_epoch=-1,
                                        verbose=True)

for epoch in range(epochs):
    total_loss = []
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad(set_to_none=True)  # Initialize gradient
        data = data[:, 0, :, :].reshape(-1, 1, 28, 28).to(DEVICE)
        output = model4(data).to(DEVICE)  # Forward pass
        target = F.one_hot(target, num_classes = 30).float().to(DEVICE)
        loss = loss_func(output, target).to(DEVICE)  # Calculate loss
        loss.backward()  # Backward pass
        optimizer.step()  # Optimize weights
        total_loss.append(loss.item())  # Store loss
    scheduler.step()
    loss_list.append(sum(total_loss) / len(total_loss))
    print("Training [{:.0f}%]\tLoss: {:.4f}".format(100.0 * (epoch + 1) / epochs, loss_list[-1]))

What should happen?

It would be great if GPU simulators can be used for training and testing models. The function can be modified to convert according to the device being used. If there is any error in our method of using the GPU, please do let us know!

We used the suggestions from this issue but it didnt seem to work https://github.com/Qiskit/qiskit-machine-learning/issues/286

https://github.com/Qiskit/qiskit-machine-learning/blob/26a5a69580d4f05cb4f0aa1525fc2144fa4413fa/qiskit_machine_learning/connectors/torch_connector.py#L104

Any suggestions?

No response

AndrewCesc commented 2 years ago

I can reproduce the issue too. Currently, it keeps processing data on the cpu rather than gpu, making it impossible to accelerate training by using gpu devices.

adekusar-drl commented 2 years ago

@ktasha45 These lines are wrong:

from qiskit.providers.aer import AerError
if torch.cuda.is_available():
    DEVICE = torch.device('cuda')
else:
    DEVICE = torch.device('cpu')

simulator_gpu = Aer.get_backend('aer_simulator_statevector')
simulator_gpu.set_options(device=DEVICE)

If you want to make use of GPU support in Qiskit Aer you have to write:

backend = Aer.get_backend("aer_simulator")
backend.set_options(device='GPU')
qi = QuantumInstance(backend)

Take a look here: https://qiskit.org/documentation/tutorials/simulators/1_aer_provider.html#GPU-Simulation

The issue is fixed in #335, hope the PR will be merged soon.

qiskit-community / qiskit-machine-learning