lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
957 stars 220 forks source link

[BUG] Deadloop on `LazyRepeater` for non re-iterable. #1222

Open chenjiasheng opened 1 year ago

chenjiasheng commented 1 year ago

The LazyRepeater is intended for a re-iterable Iterable. If the iterable is not re-iterable, for example, if it is a generator created by a generator expression or the yield keyword, then iterating on the LazyRepeater would hang indefinitely without yielding any real items when times is not specified, or it would yield fewer items than the user expected if times > 1.

Here is a simple reproduction of the issue:

it = (x for x in range(10))
repeated = LazyRepeater(iterable=it, times=None, preserve_id=True)
for x in repeated:
    print(x)
# Hangs after printing 10 numbers

My proposed solution:

  1. Add a "re-iterable" and "non-empty" restriction for the input iterable in the docstring.
  2. Whenever an epoch starts, if the very first yield statement raises a StopIteration error, then raise an Exception complaining about "not being non-empty" and "not being re-iterable" (for the second and later epochs only).
  3. Change the parameter name from 'iterator' to 'iterable' (however, this may cause backward compatibility issues).
chenjiasheng commented 1 year ago

@oplatek @songmeixu @johnjosephmorgan @stachu86

pzelasko commented 1 year ago

I am OK with your proposed solution, could you make a PR?