pytorch / ignite

High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.
https://pytorch-ignite.ai
BSD 3-Clause "New" or "Revised" License
4.52k stars 617 forks source link

Exception when using torchdata.datapipes with two pass evaluation #2825

Closed david-waterworth closed 10 months ago

david-waterworth commented 1 year ago

❓ Questions/Help/Support

I'm having issues with the quickstart. The issue seems to be that I'm using torchdata.datapipes to construct my dataloaders

def get_data_loaders(args):
    train_datapipe, test_datapipe= DATASETS[args.dataset](root=args.data_dir)
    train_datapipe = train_datapipe.shuffle().batch(args.batch_size).rows2columnar(["text", "label"])
    train_datapipe = train_datapipe.map(batch_transform)
    test_datapipe = test_datapipe.batch(args.batch_size).rows2columnar(["text", "label"])
    test_datapipe = test_datapipe.map(batch_transform)

    train_loader = DataLoader(train_datapipe, batch_size=None)
    val_loader = DataLoader(test_datapipe, batch_size=None)

    return train_loader, val_loader

So train_datapipe and test_datapipe are of type IterDataPipe

The problem is this results in the following error

_Engine run is terminating due to exception: This iterator has been invalidated because another iterator has been created from the same IterDataPipe: IterDataPipeSerializationWrapper() This may be caused multiple references to the same IterDataPipe. We recommend using .fork() if that is necessary. For feedback regarding this single iterator per IterDataPipe constraint, feel free to comment on this issue: https://github.com/pytorch/data/issues/45.

The problem appears to be that the evaluator is running on the training dataset as part of the training loop which I assume results in the training iterator being accessed twice which isn't supported (https://github.com/pytorch/data/issues/45)

I've worked around this by only performing evaluation on the validation dataset, so I'll use the pattern from the footnote (https://pytorch.org/ignite/quickstart.html#id1) but I thought I should raise that of the two supported patterns only the one that is discouraged actually appears to work with the new torchdata.datapipes

vfdev-5 commented 1 year ago

@david-waterworth thanks for reporting ! Yes, this makes sense that we can't reuse training iterator twice. In case if we want to see model generalization <-> compute metrics on validation and training data, we can create 3 data pipes: 1) training only datapipe, 2) validation datapipe and 3) another training datapipe for evaluation only that filtered to have roughly similar number of samples as in validation datapipe (if possible). Such that we can construct 3 dataloaders (like here) and I think we should be able to get rid of the datapipe limitation... What do you think ?

david-waterworth commented 1 year ago

@vfdev-5 yes that works, I wasn't sure at first how to implement it using datapipes but I noticed that you can pass the request the same split multiple times, i.e.

train_datapipe, train_val_datapipe, test_datapipe= DATASETS[args.dataset](root=args.data_dir, split=('train', 'train', 'test'))

For completeness the code that constructs the datapipe is below, the _wrap_split_argument enables the function to be called with a tuple of split names.

def _filepath_fn(root, split, _=None):
    return os.path.join(root, split + ".csv")

def _parse_fields(t):
    return dict(text=t[1].strip(), label=t[2])

@_create_dataset_directory(dataset_name="mydataset")
@_wrap_split_argument(("train", "test"))
def datapipe(root: str, split: Union[Tuple[str], str]):
    """

    Args:
        root: Directory where the datasets are saved. Default: os.path.expanduser('~/.torchtext/cache')
        split: split or splits to be returned. Can be a string or tuple of strings. Default: (`train`, `test`)
    """

    filepath_dp = IterableWrapper([_filepath_fn(root,split)])
    data_dp = FileOpener(filepath_dp, encoding="utf-8") \
        .parse_csv(skip_lines=1) \
        .map(fn=_parse_fields) \
        .shuffle() \
        .sharding_filter()

    return data_dp

I need to look closer at the ignite code though, I would assume that the end epoch event is fired outside the train iter in which case I'm not sure why the iterator isn't reset.

vfdev-5 commented 1 year ago

Resetting iterators should be in general done manually. Here is a how-to guide about majority of cases with iterators.

vfdev-5 commented 10 months ago

Let me close this issue as solved, feel free to reopen if something is still unclear