Closed aalbersk closed 1 year ago
Hi! The set_transform
does not apply a custom formatting transform on a single example but the entire batch, so the fixed version of your transform would look as follows:
from datasets import load_dataset
import torch
dataset = load_dataset("lambdalabs/pokemon-blip-captions", split='train')
def t(batch):
return {"test": torch.tensor([1] * len(batch[next(iter(batch))]))}
dataset.set_transform(t)
d_0 = dataset[0]
Still, the formatter's error message should mention that a dict of sequences is expected as the returned value (not just a dict) to make debugging easier.
I can take this
Fixed in #5553
Hi! The
set_transform
does not apply a custom formatting transform on a single example but the entire batch, so the fixed version of your transform would look as follows:from datasets import load_dataset import torch dataset = load_dataset("lambdalabs/pokemon-blip-captions", split='train') def t(batch): return {"test": torch.tensor([1] * len(batch[next(iter(batch))]))} dataset.set_transform(t) d_0 = dataset[0]
Still, the formatter's error message should mention that a dict of sequences is expected as the returned value (not just a dict) to make debugging easier.
ok, will change it according to suggestion. Thanks for the reply!
Describe the bug
When dataset contains a 0-dim tensor, formatting.py raises a following error and fails.
Steps to reproduce the bug
Load whichever dataset and add transform method to add 0-dim tensor. Or create/find a dataset containing 0-dim tensor. E.g.
Expected behavior
Extractor will correctly get a row from the dataset, even if it contains 0-dim tensor.
Environment info
datasets==2.8.0
, but it looks like it is also applicable to main branch version (as of 16th February)