Closed chongxiaoc closed 3 years ago
Ok, with pytest -s
, it is showing _add_many
raises runtime error as below:
retrieved_so_far = None
for idx in range(5):
> batch = next(it)
test_pytorch_dataloader.py:257:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../pytorch.py:124: in __iter__
for batch in self._iter_impl():
../pytorch.py:394: in _iter_impl
for b in self._iter_impl_worker():
../pytorch.py:441: in _iter_impl_worker
other_shuffling_buffer.add_many(batch.values())
../reader_impl/pytorch_shuffling_buffer.py:36: in add_many
return self._add_many(items)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <petastorm.reader_impl.pytorch_shuffling_buffer.BatchedRandomShufflingBuffer object at 0x7f26794ad8d0>
items = [tensor([230000, 50000, 400000, 300000, 390000, 310000, 110000, 0, 180000,
70000], dtype=torch.int32)]
def _add_many(self, items):
if self._done_adding:
raise RuntimeError('Can not call add_many after done_adding() was called.')
if not self.can_add():
> raise RuntimeError('Can not enqueue. Check the return value of "can_enqueue()" to check if more '
'items can be added.')
E RuntimeError: Can not enqueue. Check the return value of "can_enqueue()" to check if more items can be added.
../reader_impl/pytorch_shuffling_buffer.py:238: RuntimeError
--------------------------------------------------------------------- Captured log call ---------------------------------------------------------------------
ERROR petastorm.pytorch:pytorch.py:128 Iteration on Petastorm DataLoader raise error: RuntimeError('Can not enqueue. Check the return value of "can_enqueue()" to check if more items can be added.')
find the root for this bug: When enabling in-mem caching, a secondary shuffling queue is created: https://github.com/uber/petastorm/blob/15b35798d6140efe90f8467072dd55d12a8f79c1/petastorm/pytorch.py#L345
This secondary shuffling queue is set the same size as normal shuffling queue.
However, during the iteration of reading file, normal shuffling queue will increase when adding rows and decrease when producing shuffled batches. However, this secondary shuffling queue is only increasing to cache all data, and it will eventually explode once feeding more data than its size.
To fix it, I think the unit test should set shuffling queue size >= number of rows.
Also add some comments to mention this note in pytorch.py
.
I will draft a fix soon.
The parameter
shuffling_queue_capacity
is not used in the unit test. That means it is always0
.https://github.com/uber/petastorm/blob/15b35798d6140efe90f8467072dd55d12a8f79c1/petastorm/tests/test_pytorch_dataloader.py#L230
However, if I added
shuffling_queue_capacity
intoextra_loader_params
, test withshuffling_queue_capacity=20
are all failing. https://github.com/uber/petastorm/blob/15b35798d6140efe90f8467072dd55d12a8f79c1/petastorm/tests/test_pytorch_dataloader.py#L238Is this a known issue or we just missed that?
@abditag2 @selitvin @tgaddair