pytorch / data

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
BSD 3-Clause "New" or "Revised" License
1.13k stars 152 forks source link

Roadmap for mixed chain of multithread and multiprocessing pipelines? #1184

Open npuichigo opened 1 year ago

npuichigo commented 1 year ago

🚀 The feature

pypeln has a nice feature to chain pipelines which may run on different kind of workers including process, thread or asyncio.

data = (
    range(10)
    | pl.process.map(slow_add1, workers=3, maxsize=4)
    | pl.thread.filter(slow_gt3, workers=2)
    | pl.sync.map(lambda x: print x)
    | list
)

image

I remembered that in the first proposal of pytorch/data, it claims to support something alike. I'd like to ask if it's still planed and the concrete roadmap.

Motivation, pitch

Initial proposed

Alternatives

No response

Additional context

No response

npuichigo commented 1 year ago

@ejguan

ejguan commented 1 year ago

Sorry for the late response. TBH, this has been in our long-term roadmap when we createdTorchData project. But, unfortunately, me and @NivekT are not working on TorchData anymore. Stay tuned on the update later.