BatchedDataLoader with shuffling_queue_capacity=0 is very slow

We will use BatchedNoopShufflingBuffer as the underlying shuffling buffer implementation. The actual implementation is super slow when a large batch of data is added to it, since it will try to

        while self._num_samples >= self.batch_size:
            self._make_batch()

We typically end up with _num_samples being a very big number. Now with small batch size, a huge number of _make_batch calls would be made.

A better solution is to produce a batch each time a batch is requested and not to pay all the price upfront.

uber / petastorm

BatchedDataLoader with shuffling_queue_capacity=0 is very slow #653