Open dudulightricks opened 2 months ago
@alanwaketan can you take this one?
Can you pad your tensors?
@alanwaketan Yes I can but I'm asking if it would give me any benefit compared to just mark_sharding on each tensor.
The dataloader will prefetch the data into the device. That's the most outstanding benefits you get by using any data loaders.
❓ Questions and Help
If we have a few tensors in a batch with different sizes and we use mark_sharding on each of them, we lose something comparing to input_sharding=xs.ShardingSpec in the MpDeviceLoader (which only works for a single size of tensor in the batch)? @JackCaoG