Currently, the library expects the dataloader to provide / the model to consume (x, y) pairs. This isn't appropriate for, e.g., autoregressive tasks like language modeling.
See, for example, HuggingFace's Trainer._prepare_input and the snippet below for how to handle this (we probably want to allow the user to return other intermediate results in evaluate, which they may want to use for estimation:
data = _prepare_input(data, device)
results = evaluate(model, data)
if isinstance(results, dict):
loss = results.pop("loss")
elif isinstance(results, tuple):
loss = results[0]
if len(results) > 1:
results = loss[1:]
elif isinstance(results, torch.Tensor):
loss = results
results = None
elif hasattr(results, "loss"):
loss = results.loss
else:
raise ValueError("compute_loss must return a dict, tuple, or torch.Tensor")
I'll file a PR, but probably not until after NeurIPS.
Currently, the library expects the dataloader to provide / the model to consume
(x, y)
pairs. This isn't appropriate for, e.g., autoregressive tasks like language modeling.See, for example, HuggingFace's
Trainer._prepare_input
and the snippet below for how to handle this (we probably want to allow the user to return other intermediate results inevaluate
, which they may want to use for estimation:I'll file a PR, but probably not until after NeurIPS.