justheuristic / prefetch_generator

Simple package that makes your generator work in background thread
The Unlicense
272 stars 22 forks source link

memory leak in a loop #8

Open jdxyw opened 3 years ago

jdxyw commented 3 years ago

Hi,

I meet an issue about the memory leak. here is my usage,

The code below is fine with the default dataloader in PyTorch, the memory usage is stable.
If I use the BackgroundGenerator to replace the PyTorch dataloader, the CPU memory would increase over time.


for epoch in range(0, epoches):
     train0 = load(filelist[0])
     for fidx in range(1, len(file_list) - 1):
        if fidx < len(file_list) - 1:
            # AsyncLoading is a custom threading class to load file in another threading.
            backgroud = AsyncLoading(file_list[fidx], fidx)
            backgroud.start()

        if fidx % 2 == 1:
            train_dataloader = data.DataLoader(dataset=Dataset(train0),
                                               batch_size=48,
                                               shuffle=True,
                                               drop_last=True,
                                               collate_fn=collate_fn)
        else:
            train_dataloader = data.DataLoader(dataset=Dataset(train1),
                                               batch_size=48,
                                               shuffle=True,
                                               drop_last=True,
                                               collate_fn=collate_fn)
       ####
       # Do the training here.
       ####

       if fidx < len(file_list) - 1:
            backgroud.join()
            if fidx % 2 == 0:
                train0 = backgroud.get_result().copy()
            else:
                train1 = backgroud.get_result().copy()
       ### clean the dataset here and call gc.collect.

My new dataloader.

class DataLoaderX(data.DataLoader):
    def __iter__(self):
        return BackgroundGenerator(super().__iter__(), max_prefetch=2)

The memory usage comparison. The only difference is the dataloader.

截屏2021-04-04 下午3 22 27 截屏2021-04-04 下午3 22 44

Best Regards,

justheuristic commented 3 years ago

Hi, Thanks for reporting! I will try to look into this, but i cannot guarantee that i'll do that quickly, i'm a bit overloaded RN :(

Luxter77 commented 11 months ago

Looking over the code, I think this is caused by the generator thread not joining after the iteration stops, leaking its resources.

Luxter77 commented 11 months ago

A solution might be adding a handle to join/terminate the thread when GeneratorExit and/or StopIteration is raised