openclimatefix / ocf_datapipes

OCF's DataPipe based dataloader for training and inference
MIT License
13 stars 11 forks source link

Mulite processing on batch #135

Open peterdudfield opened 1 year ago

peterdudfield commented 1 year ago

Instead of looping on the batch, could we use multi processing?

code is here

jacobbieker commented 1 year ago

Each worker thread is the multiprocessing though, and each example is emitted one by one by the datapipes before it, so not sure multiprocessing that code would speed it up at all. Adding more workers seems like probably the best solution to it, I would think.

jacobbieker commented 1 year ago

Unless we want to break that each datapipe is independent of other ones, in which case we could maybe make a function or datapipe that loads multiple examples per step at once, and then gives them all to the next datapipe, still not sure that would be faster though