Closed DanieleCucurachi closed 3 months ago
hi @DanieleCucurachi, we will double-check but afaik the loss at iteration 0 should correspond to the loss before applying the first optimization step
Hi @dominikandreasseitz, as far as I see in train()
, write_tensorboard()
is called after optimize_step()
. Furthermore, when I retrieve the values of the loss during training for identical models (same circuit and initial weights) trained using different optimizers, the values at the first iteration 0
are not consistent.
I was about to fix it myself, should I wait for someone to check first?
# outer epoch loop
for iteration in progress.track(range(init_iter, init_iter + config.max_iter)):
try:
# in case there is not data needed by the model
# this is the case, for example, of quantum models
# which do not have classical input data (e.g. chemistry)
if dataloader is None:
loss, metrics = optimize_step(
model=model,
optimizer=optimizer,
loss_fn=loss_fn,
xs=None,
device=device,
dtype=data_dtype,
)
loss = loss.item()
elif isinstance(dataloader, (DictDataLoader, DataLoader)):
loss, metrics = optimize_step(
model=model,
optimizer=optimizer,
loss_fn=loss_fn,
xs=next(dl_iter), # type: ignore[arg-type]
device=device,
dtype=data_dtype,
)
else:
raise NotImplementedError(
f"Unsupported dataloader type: {type(dataloader)}. "
"You can use e.g. `qadence.ml_tools.to_dataloader` to build a dataloader."
)
if iteration % config.print_every == 0 and config.verbose:
print_metrics(loss, metrics, iteration)
if iteration % config.write_every == 0:
write_tensorboard(writer, loss, metrics, iteration)
if config.folder:
if iteration % config.checkpoint_every == 0:
write_checkpoint(config.folder, model, optimizer, iteration)
Hi @dominikandreasseitz, as far as I see in
train()
,write_tensorboard()
is called afteroptimize_step()
. Furthermore, when I retrieve the values of the loss during training for identical models (same circuit and initial weights) trained using different optimizers, the values at the first iteration0
are not consistent.I was about to fix it myself, should I wait for someone to check first?
# outer epoch loop for iteration in progress.track(range(init_iter, init_iter + config.max_iter)): try: # in case there is not data needed by the model # this is the case, for example, of quantum models # which do not have classical input data (e.g. chemistry) if dataloader is None: loss, metrics = optimize_step( model=model, optimizer=optimizer, loss_fn=loss_fn, xs=None, device=device, dtype=data_dtype, ) loss = loss.item() elif isinstance(dataloader, (DictDataLoader, DataLoader)): loss, metrics = optimize_step( model=model, optimizer=optimizer, loss_fn=loss_fn, xs=next(dl_iter), # type: ignore[arg-type] device=device, dtype=data_dtype, ) else: raise NotImplementedError( f"Unsupported dataloader type: {type(dataloader)}. " "You can use e.g. `qadence.ml_tools.to_dataloader` to build a dataloader." ) if iteration % config.print_every == 0 and config.verbose: print_metrics(loss, metrics, iteration) if iteration % config.write_every == 0: write_tensorboard(writer, loss, metrics, iteration) if config.folder: if iteration % config.checkpoint_every == 0: write_checkpoint(config.folder, model, optimizer, iteration)
sure but the loss returned by the first optimize_step call should be the initial one (before having applied any optimization). i can try to have a look today, but if you want to have a crack at it yourself, youre very welcome
@chMoussa can you have a look at this? cc @Roland-djee
Can we have your code example to work on @DanieleCucurachi ?
Here is a simple snippet that exposes the problem:
from pathlib import Path
import torch
from functools import reduce
from operator import add
from itertools import count
import matplotlib.pyplot as plt
from qadence import Parameter, QuantumCircuit, Z
from qadence import hamiltonian_factory, hea, feature_map, chain
from qadence.models import QNN
from qadence.ml_tools import TrainConfig, train_with_grad, to_dataloader, DictDataLoader
DEVICE = torch.device('cpu')
DTYPE = torch.complex64
SEED = 42
torch.manual_seed(SEED)
n_qubits = 4
fm = feature_map(n_qubits)
ansatz = hea(n_qubits=n_qubits, depth=3)
observable = hamiltonian_factory(n_qubits, detuning=Z)
circuit = QuantumCircuit(n_qubits, fm, ansatz)
model = QNN(circuit, observable, backend="pyqtorch", diff_mode="ad")
batch_size = 100
input_values = {"phi": torch.rand(batch_size, requires_grad=True)}
pred = model(input_values)
cnt = count()
criterion = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
def loss_fn(model: torch.nn.Module, data: torch.Tensor) -> tuple[torch.Tensor, dict]:
next(cnt)
x, y = data[0], data[1]
out = model(x)
loss = criterion(out, y)
return loss, {}
def validation_criterion(
current_validation_loss: float, current_best_validation_loss: float, val_epsilon: float
) -> bool:
return current_validation_loss <= current_best_validation_loss - val_epsilon
n_epochs = 10
config = TrainConfig(
max_iter=n_epochs,
print_every=1,
batch_size=batch_size,
checkpoint_best_only=True,
val_every=1, # The model will be run on the validation data after every `val_every` epochs.
validation_criterion=validation_criterion,
folder='./tmp/exqnn'
)
fn = lambda x, degree: .05 * reduce(add, (torch.cos(i*x) + torch.sin(i*x) for i in range(degree)), 0.)
x = torch.linspace(0, 10, batch_size, dtype=torch.float32).reshape(-1, 1)
y = fn(x, 5)
data = DictDataLoader(
{
"train": to_dataloader(x, y, batch_size=batch_size, infinite=True),
"val": to_dataloader(x, y, batch_size=batch_size, infinite=True),
}
)
print("Initial loss", loss_fn(model, (x,y))[0].item())
train_with_grad(model, data, optimizer, config, loss_fn=loss_fn,device=DEVICE, dtype=DTYPE)
plt.clf()
plt.plot(x.numpy(), y.numpy(), label='truth')
plt.plot(x.numpy(), model(x).detach().numpy(), "--", label="final", linewidth=3)
plt.legend()
Once run, check tensorboard --logdir tmp
Thanks @DanieleCucurachi. From the function definition, we log actually the values after optimization, so we should increment the init_iter
in this function to be consistent with other ML frameworks. I understand though that it would be useful to log the pre-training evaluations in tensorboard. This can be done via adding an extra argument in TrainConfig and quickly modifying the train function to do so. I will add this feature if you agree, @dominikandreasseitz and @DanieleCucurachi ?
@chMoussa sounds good, tag me on the PR thank you!
@chMoussa great, please tag me as well. Thank you!
Hey, eventually by looking more carefully at the train function, it turns out that
I plan to solve these inconsistencies for the new feature.
The
train()
function withinml_tools/train_grad.py
exhibits unexpected behavior as it initiates the TensorBoard logging after the first training iteration. Consequently, we do not have access to the value of the loss before the first optimization step. When examining the training logs, the earliest loss value available is after the optimization has started, which is tagged as iteration0
. It would be more appropriate to log the pre-optimization loss as iteration0
.