Batch Dataset/DataLoader

Hey @EvenOldridge , I stumbled upon your post about cudf dataloaders speeding up training a long time ago... and recently got around to actually trying my hand at it, so thanks for introducing me to the idea!

I'm just curious, but is there a specific reason you implemented the batchdataloaders like you did in this repo instead of using a custom sampler + regular DataLoader with 0 workers?

I wrote out a quick sketch of the implementation I'm thinking of here: https://gist.github.com/NegatioN/1f63c3a79dfe13b183d413123d37d4fa

I understand that your implementation might already have changed significantly since you mentioned integrating it with fast.ai, but I was curious if you ruled this out for any specific reason that I can't clearly see atm. I would think it has the same performance capabilities?

Edit: The biggest difference might be we can grab each batch as a single read from contiguous memory? Did you test how large the impact of this was?

/Joakim

rapidsai / deeplearning

Batch Dataset/DataLoader #3