pytorch / data

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
BSD 3-Clause "New" or "Revised" License
1.13k stars 152 forks source link

Modify `IterKeyZipper` to accept any number (2+) of `IterDataPipe` #334

Open NivekT opened 2 years ago

NivekT commented 2 years ago

🚀 The feature

Currently, IterKeyZipper can only zip two DataPipes together.

https://github.com/pytorch/data/blob/198cffe7e65a633509ca36ad744f7c3059ad1190/torchdata/datapipes/iter/util/combining.py#L13

The proposal is to modify its API to allow users to pass in any number (two or more) IterDataPipe, rather than just strictly two. We will have to think carefully about how the key_fn and ref_key_fn will change.

Motivation, pitch

There are situations where users may want to zip multiple DataPipes together.

Alternatives

Users will have to write their own custom DataPipes.

Additional context

There are similar DataPipes and we should keep in mind that their APIs should be as consistent as possible.

pmeier commented 2 years ago

Another alternative is to "stack" multiple IterKeyZipper's. This is what torchvision is currently doing and it is ugly:

Given that we already have use cases for this, I could look into it.