nlp-with-transformers / notebooks

Jupyter notebooks for the Natural Language Processing with Transformers book
https://transformersbook.com/
Apache License 2.0
3.7k stars 1.13k forks source link

Chapter-10 Infinite iterator #126

Open Roland-Szucs opened 8 months ago

Roland-Szucs commented 8 months ago

Information

The problem arises in chapter:

Describe the bug

class ConstantLengthDataset(IterableDataset) results infinite iteration. The reason for this is due to the exception handling part when we run out from the underlying dataset and catch the StopIteration exception. The code there:

To Reproduce

We do not need that as HF already created this code correctly just forgot to update this notebook. In this youtube video , the presented code is good. When the StopIteration is caught, the following good code is shown:

try:
    m=f"Fill buffer: {buffer_len}<{self.input_characters:.0f}"
    print(m)
    buffer.append(next(iterator)["content"])
    buffer_len += len(buffer[-1])
except StopIteration:
    more_examples = False
    break
try:
    m=f"Fill buffer: {buffer_len}<{self.input_characters:.0f}"
    print(m)
    buffer.append(next(iterator)["content"])
    buffer_len += len(buffer[-1])
except StopIteration:
    iterator = iter(self.dataset)

Expected behavior

Do not start the iteration if we just finished it otherwise it results infinit